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Preface 



This year the SOFSEM conference is coming back to Milovy in Moravia to 
be held for the 26^^ time. Although born as a local Czechoslovak event 25 years 
ago SOFSEM did not miss the opportunity offered in 1989 by the newly found 
freedom in our part of Europe and has evolved into a full-fledged international 
conference. For all the changes, however, it has kept its generalist and multi- 
disciplinary character. The tracks of invited talks, ranging from Trends in Theory 
to Software and Information Engineering, attest to this. Apart from the topics 
mentioned above, SOFSEM ’99 offers invited talks exploring core technologies, 
talks tracing the path from data to knowledge, and those describing a wide 
variety of applications. 

The rich collection of invited talks presents one traditional facet of SOFSEM: 
that of a winter school, in which IT researchers and professionals get an oppor- 
tunity to see more of the large pasture of today’s computing than just their 
favourite grazing corner. To facilitate this purpose the prominent researchers 
delivering invited talks usually start with a broad overview of the state of the 
art in a wider area and then gradually focus on their particular subject. 

Apart from being a winter school, SOFSEM is a conference where outstanding 
submitted contributions selected by the programme committee are presented 
in shorter contributed talks also published in the proceedings. This year the 
23 members of the programme committee, coming from 8 countries, evaluated 
45 submissions, with several sub-referees joining the effort. After careful review, 
followed by a comprehensive discussion at a PC meeting held at the end of June, 
18 contributions were selected for presentation at SOFSEM ’99. At SOFSEM the 
invited and contributed talks will be supplemented, as usual, by refereed posters 
and by flash communications. 

Throughout its history, SOFSEM has served as a meeting ground for profes- 
sionals from both the theoretical and the practical side of computing. Alas, the 
fence dividing those two still exists after all these years. Yes, there have always 
been holes in it and some of you may insist that they are getting larger lately. 
Having lived on both sides I find it difficult to judge; at any rate I believe that 
placing SOFSEM squarely across the fence is and will remain useful. 

There are many people whose contribution to the preparation of SOFSEM ’99 
I acknowledge gratefully - and gladly. Gerard Tel as the co-chair has done much 
to make the programme committee work smoothly and productively. He has been 
a great help and a great pal, taking the initiative whenever he sensed (rightly) 
that it was time to step in. The members of the PC were hard working and 
disciplined. Some were special: Peter van Emde Boas and Torben Hagerup went 
well beyond their assignments. The names of all PC members as well as those 
of the sub-referees are recorded elsewhere. Together they made the selection 
process -rather difficult this year - balanced and fair. Having a paper refused 
is not exactly an exhilarating experience, but more often than not the author 
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received a detailed analysis of the contribution, suggesting improvements and 
providing hints for further research. 

It has been proven possible to have a SOFSEM without Jan Staudek leading 
the organizing committee, without Jiff Sochor serving as the PC secretary, or 
without Miroslav Bartosek taking care of the proceedings. Nevertheless, I am 
glad I was lucky enough not to have to try. Working with Jan, Jiff, and Miro- 
slav saved me much sweat and some embarrassment. Now that Springer-Verlag 
(whose continuing trust in SOFSEM is very much appreciated) publishes LNCS 
in both printed and electronic form, editing the proceedings is more complicated; 
fortunately we had Petr Sojka and Ales Kfenek to help, with Petr Sojka taking 
over most of the burden. 

To my colleagues in the SOFSEM steering committee I owe my gratitude not 
only for guidance and advice, but also for recruiting most of the invited speakers. 

During the transition to an international event we have been able to stick to 
the tradition of making SOFSEM affordable to the public from Central and East- 
ern Europe and even support the participation of graduate and post-graduate 
students financially. This is due to the generosity of our sponsors whose names 
are also recorded in these proceedings. 

Having said my thanks I now get to wishes. I recall Jan Staudek responding to 
complaints about meeting in Milovy again and again by noting that the Hawaii 
Conference on System Sciences does not move from Hawaii and nobody seems to 
protest. Well, I must admit that apart from clear air and abundance of vegetation 
our venue does not offer the sort of attraction Hawaii does. On the other hand, 
less tourist attraction means less distraction, making the participants huddle 
together during whatever free time the busy schedule of SOFSEM allows for. 
This can be used in various productive ways -from planning joint research and 
projects to sitting at the feet of the veterans and soaking up war stories and 
tricks of the trade. I hope the participants will enjoy all that SOFSEM ’99 has 
to offer, both on and off its time-table. 
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Abstract. Quantum information processing research has brought deep 
insights into the quantum potentials of Nature for computing, revealed 
new information processing primitives, principles, concepts, methods and 
also brought some spectacular results. On the other side, the problems 
connected with the design of powerful quantum processors are still seen 
as immense. 

The aim of the talk is to present and to analyse the current situation 
in quantum information processing and, especially, to point out main 
quantum challenges that need to be attacked in order to make a further 
progress in this fascinating and promising area. 



1 Prologue 



There are times on which a genius would wish to live. 
It is not in the still calm of life, or in the response of 
a pacific situation, that great challenges are formed. . . . 
Great necessities call out great virtues. 

Abigoul Adams (1744-1810) 

An understanding of the quantum information processing (QIP) fundamen- 
tals. principles, laws and limitations, as well as utilization of its potentials, is 
one of the main challenges of the current science and technology in general, and 
of the computing and quantum physics in particular. 

On one side, there is a strong feeling, based on the historical experiences, 
and on the recent progress in the area, that the current merging of two 
arguably most powerful scientific and technological developments of 20^^ cen- 
tury-quantum physics and computing - will converge into a fruitful marriage 
with strong impacts on both areas, with a better understanding of the nature of 
physical and information processing worlds and, perhaps, with a new technology 
that could represent the computer revolution. 

On the other side, problems that need to be overcome, in order to make this 
vision true, are still overhelming, especially on the experimental and technolog- 
ical side, but not only there. In order to overcome these difficulties, if that is 

Paper has been written during the author stay at the University of Nice, Sophia 
Antipolis, within the PAST program. Support of the GACR grant No. 201/98/0369 
and of the grant CEZ:J07/98: 143300001 is also to appreciate. 

J. Pavelka, G. Tel, M. Bartosek (Eds.): SOFSEM’99, LNCS 1725, pp. 1 28, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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at all fully possible, it is useful to identify carefully main quantum (information 
processing) challenges and to concentrate on dealing with them. 

There are many ways how to illustrate the dimension of the potentials the 
developments in quantum computing can provide and by that also the potential 
merit of such quantum challenges. Let us start with a general and global view 
of the history of mankind that distinguishes three basic eras: 

Neolithic era. The progress was made on the basis of the development of 
means to achieve that mankind had enough food and whenever needed. 
Industrial era. The progress has been made on the basis of the development 
of means to achieve that mankind has had enough energy and whenever 
needed. 

Information era. The progress is being made on the basis of the development 
of means to achieve that mankind has enough information and whenever 
needed. 

The above views point out the key role of information for the future. The 
next global view concerns the history and the nature of computing. 

19^^ century: Computing was seen as mental processes. 

20*^ century: Computing has been seen as machine processes. 

21^^ century: Computing will be seen as Nature processes. 

The last view points out that in order to make a progress in information 
processing we have to look more into the Nature and to try better to utilise 
its natural information processing capabilities. The last view points out one 
direction to go in making use of the information processing potentials of Nature. 
(Biocomputing seems to be another direction and challenge.) 

19^^ century: Progress was made by the conscious application of the classical 
mechanics (Newton equations and thermodynamics), to create a basis for 
the industrial revolution. 

20*^ century: Progress has been made by a conscious application of modern 
mechanics (Maxwell equations and electromagnetism), to create energy and 
information distribution networks and tools and to utilize them in order to 
create a basis for the information revolution. 

21^^ century: Progress is expected to be made by a conscious application of 
quantum mechanics (Schrodinger equation and information processing), to 
create a basis for the quantum information processing revolution. 

In a complex world we live one often needs simple (but proper) “slogans” to see 
a right way to go. Slogans presented above suggest that quantum information 
processing is a big challenge and, naturally, great ideas in this field may have 
large impacts. The developments during the last years in this area also suggest 
that in spite of many deep and surprising results and discoveries, the field is at 
its beginning and “the floor is open for new great ideas to come” . 

There are also urgent practical needs to pursue as much as possible the 
idea of designing quantum computers. Indeed, for a progress in physics and 
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other sciences it would be much useful to be able to use quantum computers to 
simulate quantum phenomena, what classical computers cannot do (efficiently). 
In addition, it is becoming clear that in order to be able to make progress in 
computing technology still for a while according to the Moore law, we need to 
go for computing power into the quantum level, one way or another. 

2 Introduction 



Progress in science is often made by pessimists. 

Progress in technology is mainly made by optimists. 

Two outcomes, one on the theoretical level and one on the experimental level, 
that appeared at about the same time, around 1994, acted as apt killers for quan- 
tum information processing -a fascinating challenge that has been developed 
rapidly since then. Interesting enough, both of these results are of deep interest 
for security of communication - one of the key problems to deal with in order to 
put the global information society communication on a secure basis. The first of 
these results were Shor’s quantum polynomial time algorithms for factorization 
and discrete logarithm computation that would allow, if quantum computers 
were available, to break many of the currently used systems for encryptions and 
digital signatures. The second of these results were successful, several kilome- 
ters long, transmissions of photons that could be a basis for new ways quantum 
key generation is performed, representing a significantly new dimension in the 
security of communication, at which undetectable eavesdropping is impossible, 
on the basis of the physical laws. 

Interesting enough, the basics of quantum mechanics needed to develop quan- 
tum information processing have already been known for more than 60 years. 
From this pont of view it may seem a bit surprising that neither Turing nor 
von Neumann, one of the fathers of both fast universal computers and modern 
quantum mechanics, did not try to develop computing principles on the laws of 
quantum physics, what would better correspond to their times knowledge of the 
physical world, but they did it on the basis of the classical physics. However, 
a closer look to the overall state of knowledge and practical needs of their times 
quite clearly shows why the idea of quantum computing got to be of a broader 
interest only around 1994-5. At the times of Turing and von Neumann, it did 
not seem feasible that quantum laws, due to the requirement on the reversibility 
of quantum evolution, randomness of quantum measurements, uncertainty prin- 
ciple and non-locality, could be utilized to make reversible quantum evolution to 
perform precise, universal and powerful computations. This started to be seen 
as possible only after Bennett’s result^ in 1973 -that universal reversible Turing 
machines exist. Moreover, it was practically impossible to develop a sufficiently 
justified belief that it could pay off to try to overcome enormous technologi- 
cal problems on the way to the design of powerful quantum computers, that 
clearly need to be dealt with, until the theory of computational complexity, 

^ Those older and well-known references that cannot be found in the list of references 
of this paper can be found in [21]. 
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and especially theory of randomized computational complexity, was sufficiently 
developed, what happened only in the years 1980-1990. At that time physics 
hardly had available concepts and tools to develop a deeper and broader under- 
standings of the potential quantum computing could provide. Finally, it was just 
around 1994-5, when sufficient experience in experimental atomic physics, optics 
and nuclear magnetic resonance has developed to the point to start to consider, 
at least in a rudimentary form, building of the experimental quantum gates and 
processors. For example, the idea of the ion trapped technology, that was the 
first one suggested to experiment in quantum information processing, by Cirac 
and Zoller, in 1995, was developed only shortly before that. One can therefore 
conclude that without theoretical advances in the randomized complexity theory 
and in the experimental atomic physics and optics, it was practically impossible 
that the idea of quantum information processing would get a significant momen- 
tum, no matter how theoretically natural it was already for some years. 

Potentials of quantum computers seem to be immense. A quantum computer 
with “tiny” quantum memory of 200-1000 qubits could be used to break some 
important current cryptosystems. Even a quantum computer with a “toy-size” 
memory, of 10-20 qubits, could be used to perform significant quantum experi- 
ments to enhance our knowledge of quantum physics and of the other areas of 
science. However, immense seem to be also problems with designing quantum 
computers of a significant power. For example, very hard seem to be problems 
with providing long term quantum memory, with interactions of particles to 
perform conditional quantum gates, with fighting decoherence and with quan- 
tum measurement - to mention some. These problems are so difficult and seem 
to grow so fast with the number of qubits involved and computation time used, 
that in spite of the fact that, step by step, 2-, 3-, 5- and even 7-qubit experiments 
have been already reported [32] recently, the new quantum initiative in Europe, 
within the 5th Eramework program, suggested, in November 1998, to put into 
the agenda, for the next 4 years, the development of only a 4-qubit quantum 
processor. In addition, as already mentioned, it is still far from clear whether 
the design of really powerful quantum computer is fully possible at all. A much 
better situation seems to be with quantum cryptography, especially in the area 
of quantum key generation.^ 

Though it is practically impossible to foresee particular developments, as it 
usually the case in such scientific adventures, one can say quite safely, that in 
any case we can expect significant increase of our theoretical knowledge of the 
physical and information processing worlds, their laws and limitations, and that 
one can expect that at least some byproducts of such research and developments 
will have important impacts in various areas, as discussed later. 

In any case quantum information processing is a fascinating intellectual 
adventure. Problems range from such esoteric ones as counterfactual compu- 
tations, quantum teleportation, non- locality, Zeno effect impacts and quantum 



^ S. Braunstein is quoted for saying “We may have sooner quantum telephones than 
quantum computers” . 
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entanglement, to such prosaic ones as to how to fight decoherence, how to set 
up the initial (quasi) pure state and how to perform readouts (measurements). 

3 State of the Art 

Let us first review briefly some basics of QIP and also some of its main principles, 
outcomes and obstacles. 



3.1 Mathematical Framework 

Ideal quantum information processing is performed in Hilbert spaces and they 
correspond to isolated quantum systems. Real quantum information processing 
has to be performed in a real world, in a very noisy environment where entan- 
glement with the environment and decoherence dominate. A realistic research in 
quantum information processing has to take all this into consideration. 

Hilbert space is a mathematical framework to describe concepts and processes 
of isolated quantum systems. It is a complex linear vector space H on which an 
inner product (.|.) : 7Y x 7Y ^ C is defined such that for any G H it holds: 
(1) {(pli;) = (2) (010) > 0; (3) (0|0) = 0 iff 0 = 0; (4) (0|ci0i -h C202) = 

Cl (01 01 ) + C2(0|02)- (A quantum interpretation of |(0|0)p is as the probability 
that the state |0) evolves into the state |0)). Using the inner product one can 
define on 7Y a norm ||0|| = >/(0^0, a distance ||0 — 0|| and on this basis 
a topology. An element 0 of 7Y, represented usually by a column vector, is often 
denoted as |0). Notation (0| is used for a linear functional on 7Y, represented by 
a row vector, such that (0|(|0)) = (0|0) for any |0) G 7Y . 

Elements of H of the norm 1 are called pure states and they correspond to the 
states of a closed quantum system. Two states |0) and |0) are called orthogonal 
if (010) =0. Physically, only orthogonal states are well distinguishable. 

To a (bipartite) quantum system S that is composed of quantum systems 
Si and S 2 corresponds a Hilbert space H that is a tensor product Ha ^ Hb 
of the corresponding Hilbert spaces Ha and Hb and its elements are vectors 
that are tensor products of the vectors of both Hilbert spaces.^ Elements of 
a two dimensional Hilbert space H 2 are called qubits and they are denoted by 
a|0) -h /3|1), where |0) and |1) stands for two basis states of 7^2. A state of 7^2^, 
an n-fold tensor product of H 2 , is called an n- qubit register state. Its general 
form is 

\^) = E 

where are complex numbers and i}n = 1- 

An evolution of a quantum system (Hilbert space) is represented by a unitary 
matrix (operator) U (such that UIH = = /, where U* is a conjugate 

^ The tensor product u v of n-dimensional vectors u = (ui,...,Un) and v = 
(I’l, . . . , Vn) is an -dimensional vector with components {uivi, . . . , uiVn,U 2 Vi, . . . , 

U2Vn , . . . , UnVl 5 • • • UnVn ) • 
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transpose of U). A unitary evolution preserves scalar product and represents 
a reversible process. Unitary operators perform therefore rotations. A quantum 
evolution is a deterministic process. 

An interface between classical and quantum world is made by quantum mea- 
surements. A (von Neumann or projection) measurement on a Hilbert space is 
represented by a Hermitian matrix (observable). A measurement of a state lip) 
with respect to an observable A in a Hilbert space H has two outcomes: (1) the 
state lip) is projected into one of the subspaces of H generated by eigenvectors 
corresponding to an eigenvalue of A; (2) classical information is produced into 
which subspace the projection took place. Projection into one of the possible 
subspaces is done randomly and the corresponding probability is determined 
by the coefficients at the eigenvectors of that subspace when l^p) is expressed 
using an orthonormal basis of the eigenvectors of A. In general, a projection 
measurement irreversibly destroys a measured quantum state. 

Pure states, unitary operators and projection measurements are basic con- 
cepts of information processing in idealised closed quantum systems. On a more 
“real world level” we work with mixed states (density matrices), superoperators 
(certain positive and trace preserving mappings) and POV (positive operator 
valued) measurements (POVM). 

A mixed state [ip) is a probability distribution on a set of pure states (pro- 
duced by an imperfect source), notation [V’) = ®i=i{Pi,(t>i), where T,i=iPi = 1- 
The interpretation is that a source produces the state \(pi) with probability . To 
each such a mixed state [ip) corresponds a density matrix = '^i=iPi\(pi){(pi\- 
(One way to see how a density matrix emerges as a representation of 
a mixed state [ip) is that the average value of an observable O on [ip), (0)[^) = 
"^'^=i{(pi\0(pi) = Tr(Op[^)), (where Tr{A)^ trace of A, is the sum of diagonal 
elements of a matrix A). Such a density matrix captures all and only information 
that can be obtained by an observer allowed to examine infinitely many times 
states from the given source. Density matrices corresponding to two different 
mixed states can be the same. 

The so-called Shannon entropy of the mixed state [ip)^ with density matrix p, 
is defined by 

n 

Qsm) = QS{p) = -Tr{p\gp) = ~Y^\\g\, 

where is the multiset of eigenvalues of p. {QS{p) represents the degree 

of ignorance embedded in the mixed state [^) (density matrix p[^))). 

Density matrices are used also to deal with a situation that a composed 
Hilbert space A G) H is in a state \(p). In such a case a density matrix p = 
TrB\(p){(p\^ where Trs is the tracing out operation over the subspace H, rep- 
resents all and only information obtained given infinitely many opportunity to 
examine the subsystem A of the system A G) H prepared in the state | (p) . The 
most general operators that can be applied to density matrices are called super- 
operators, or trace-preserving completely positive mappings. They can be seen as 
being performed by first making a composition of a given quantum system with 
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an auxiliary system, called ancilla^ then applying on such a composed system 
some unitary transformations and, finally, discarding the auxiliary subsystem. 
Consequently, if a superoperator is applied to a quantum state, then the resulting 
state can be in a Hilbert space of larger dimension than that of the input state. 
Examples of superoperators are quantum encoders, channels and decoders. 

POV measurements are the most general type of quantum measurement. 
They can be seen as follows: one first compose an ancilla with the given quan- 
tum system, then performs unitary transformation on the composed quantum 
system and, finally, performs projection measurement on the resulting state of 
the ancilla. (Technically, any such a measurement in a d-dimensional Hilbert 
space is given by a collection of semi-definite Hermitian matrices such 

that Ei = 1. An important point is that k can be arbitrarily larger than d.) 

If, using such a measurement, a system in state p is measured, then the proba- 
bility of obtaining the ith result is Tr{pEi). For details see [21]. 

3.2 Quantum Resources 

It has been demonstrated that by using quantum algorithms, networks and com- 
munication protocols one can obtained a significant (up to exponential) increase 
in performance: computations with less quantum operations and communica- 
tions with less bit (or qubit) exchanges. The power of quantum communication 
and computation is the consequence of three quantum phenomena: 

1. Quantum superposition. An n qubit quantum register can be simultane- 
ously in a superposition of 2'^ basis states {\x) \ x G {0, 1}’^}. In addition, an 
equally weighted superposition of all its basis states of 7^2^ , 



an important initial state of many computations, can be created in a single 
step, from a simple basis state |0 . . . 0), using n simple one-qubit (Hadamard) 



On the other side, a superposition of 2^ basis states can evolve in one step 
into a superposition consisting of only one basis state. If one then makes 
the measurement of the resulting state, with respect to the given basis, then 
the outcome is obtained with probability 1. This is often used to get useful 
classical information from a complex quantum evolution because most of the 
quantum information contained in a quantum state is inaccessible. 

2. Quantum parallelism. In one step of a quantum evolution exponentially 
many classical computations can be “performed” . For example, for any func- 
tion / : {0, 1, . . . , 2^^ — 1} ^ {0, 1, . . . , 2^^ — 1}, there is a unitary mapping 
Uf: |x,0) ^ \x,f{x)) and if Uf is applied to an exponentially large super- 
position 
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then, in one step, we obtain the state 

( 1 ) 

i=0 

and therefore exponentially many values of / can be computed in one step. 

3. Quantum entanglement. Some quantum states of a bipartited quantum 
system cannot be expressed as a tensor product of the states of its subsys- 
tems. For example, the state ^(|01) — |10)), called singleton. Such pure 
states are called entangled. Entangled states are a puzzling phenomenon 
because they exhibit non-local features. Indeed, it may be the case that two 
particles in the singleton state are (very) far from each other. In spite of that 
any measurement of one of the particles immediately determines the state 
of the second particle.^ 

The concept of entanglement, or inseparability, is defined also for density 
matrices. A density matrix p of a bipartite quantum system A (g) 5 is insep- 
arable, if it cannot be written in the form p = G) pf , where pf {pf) 

is a density matrix of the subsystem A (B). 

Quantum entanglement is considered as a precious quantum resource due to 
which quantum computation and communications can outperform classical 
ones. At the same time quantum entanglement is an important demonstra- 
tion of non-local features of quantum mechanics that has so much puzzled 
already generations of scientists. On the other side, it is to a large extent 
due to the existence of the entangled states why quantum computers are so 
hard to build! 

A study of methods to create, purify, distribute and consume quantum entan- 
glement belongs to the most important current research subjects in the area 
of quantum information processing. 

4. Quantum measurements. Surprisingly, quantum meusurement should 
also be seen as a powerful computational primitive. Indeed, by perform- 
ing one measurement step one can achieve, in some cases, that a solution is 
found of a complicated system of constrains and this solution is incorporated 
into the resulting projection (state). For example, by measuring the state (1), 
with respect to the /-register, a solution of the system of constrains 

f€{f(i)j0<i<2-}, I={ijf(i) = f} 

is found and “included” into the resulting state ^ lb /)• 

^ Quantum entanglement can be seen as representing inherently quantum form of 
information distinguished from the classical one also in the following sense: classical 
information can be copied, but can be transferred only forward in time. Quantum 
information cannot be copied, but can be used to connect any two points of space 
and time (and can therefore be seen as propagating also backwards in time, as in 
the case of quantum teleportation). 
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3.3 Quantum Principles 

Several quantum principles play an important role in quantum information pro- 
cessing and communication. 

1. Quantum information cannot be (perfectly) cloned.^ 

2. Gaining classical information from quantum states causes, in general, their 
disturbance. 

3. Quantum measurements are, in general, irreversible. (However, as discussed 
later, in some interesting cases this does not have to be so.) 

4. Nonorthogonal quantum pure states cannot be faithfully distinguished.^’^ 

5. Mixed states with the same density matrix cannot be physically distin- 
guished. 

6. Entanglement cannot be increased by local unitary operations and classical 
communication. 

All these principles influence quantum information processing in several ways. 
For example, they make some possible (impossible) classical tasks to be impos- 
sible (possible) at the quantum level. 



3.4 Apt Killers 

Several results have been obtained in quantum information processing that have 
turned attention of the whole science and technology community, and also of 
many outside of this community, to quantum computing and that have con- 
tributed much to the excitement this field creates and to the support it gets. 

1. Shor’s algorithms for factorization and discrete logarithm computation. 
These results implied that potential quantum computers, even if designed 
in few decades, could jeopardize security of some current cryptosystems and 
digital signature schemes.^ 

^ The so-called no-cloning theorem says that there is no unitary transformation U 
such that for any qubit state |0), U{\(j))\0)) = | 0 )| 0 )- (An amazing result -ships can 
be cloned, but photons not.) 

® The need to distinguish nonorthogonal states occurs in many applications and quan- 
tum algorithms. 

^ Undistinguishability of nonorthogonal states is also a positive phenomenon. For 
example, compression of a sequence of nonorthogonal quantum states can go beyond 
the limits for the compression of classical bits, as stated by the Shannon theorem. 
Moreover, quantum cryptography is much based on the fact that if a quantum sys- 
tem is prepared in one of the nonorthogonal states, then any attempt to distinguish 
these two possibilities leads necessarily to a disturbance. 

® These results were followed soon by a more general one, almost unnoticed by the 
community at large, due to Boneh and Lipton [5] showing that with quantum com- 
puters one could break also other cryptosystems, including now so popular elliptic 
curve cryptosystem based on the elliptic curve discrete logarithm. 
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2. The existence of quantum error correcting codes and of a set of universal 
quantum fault tolerant gates. These results made the vision of quantum 
computer to be far more a reality than before. 

3. Quantum cryptography - theoretical and experimental. More exactly, quan- 
tum methods of unconditionally secure key distribution that very soon fol- 
lowed by successful experiments. 

3.5 Obstacles 

Decoherence remains the problem number one, in spite of a variety contribu- 
tions in the area of quantum error correcting codes, fault tolerant computation, 
concatenated codes, and quantum repeaters that show that reliable memory, 
transmission and processing of quantum information, in time and space, is, in 
principle, possible. 

In spite of that there are doubts when and whether we can really have power- 
ful quantum processors. The research in quantum information processing can be 
seen as a constructive “fight” between pessimists and optimists, with optimists 
taking currently an edge. 

4 Quantum Challenges 

Let us now turn to the main topic of this paper, to major quantum challenges 
in the area of quantum information processing. 



4.1 Quantum Processors 

Progress and success in designing quantum processors is surely of a very large 
(almost of the key) importance for the overall standing of the whole field of 
QIP. The main burden here is on (experimental) physicists and engineers, but 
an involvement of (theoretical) computer scientists can also bring large benefit. 
By creating and investigating models, computation modes and methods of sug- 
gested and developed technologies, there is a possibility to contribute to and to 
speed up their development and utilization. The case of NMR technology and 
computations, and theoretical contributions to them, is a good example in this 
direction. Two main challenges in this are can be seen as follows. 

1. The search for a proper technology for QIP, suitable to implement easily and 
reliably a universal set of quantum computational primitives, setting of the 
initial state, readouts, and to manage decoherence. 

Three technologies have been especially intensively developed and investi- 
gated so far: nuclear magnetic resonance (NMR), trapped ions (especially 
a chain of ions in a linear radio-frequency trap), and cavity QED. Especially 
NMR technology has been studied and developed much and used to make 
2- to 7-qubit experiments [31]. However, none of these technologies seem to 
scale well and one can expect that in order to have a breakthrough in this 
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area it is necessary to turn to silicon based technologies as quantum dots [17] 
or to Kane’s [29] proposal. (The basic idea is that spins of electrons confined 
to localized states in solid-state structures have the prospect of serving as 
qubits. Various physical mechanisms for implementation of quantum gates 
using such technologies have been proposed and are under investigations.) 
Another possibility, that needs to be explored, is to use superconductors and 
Josephson junctions. 

2. The search for simple universal sets of quantum computational primitives is 
another fundamental and important task. The results in this area can much 
influence the choice of suitable technologies and also the search for such 
technologies. 

Progress in this area has been remarkable and it is worth to summarize: 
(1) Deutsch’s three-qubit gate (1989); (2) DiVincenzo’s two-qubit universal 
gate (1995); (3) Barenco et al. [2] universality of the XOR gate and one- 

qubit gates; (4) universality of the XOR, Hadamard and gates, 

due to [6] (5) GHZ state (^:^(|000) + Bell measurements and one- 

qubit quantum gates [20]. 

Two other challenges in this area are: 

1. Experiments - proof-of-the-principle demonstrations of basic quantum com- 
putational primitives (entangled states (especially of more qubits)), telepor- 
tation, XOR gates. Bell measurements, Zeno effect . . . ) and experimental 
verification of simple quantum algorithms, protocols, error correcting codes, 
and fault-tolerant gates. 

Insbruck’s experimental demonstration of quantum teleportation has been 
considered as one of the three most important experimental achievements in 
physics in 1998. Various other demonstrations of quantum teleportation has 
been reported. However, as analysed by Vaidman [46], none of the experi- 
ments till then could be actually seen as a 100% demonstration of quantum 
teleportation. Something was always missing (for example Bell measure- 
ments could not be performed fully reliably). In spite of that Vaidman [46] 
concludes that though a perfect teleportation of unknown quantum state 
has not been achieved, the experiments performed so far show that it can be 
done. In addition, the so-called teleportation of continuous variables, sug- 
gested by Braunstein and Kimble [8], has been fully demonstrated by Furu- 
sawa et al. [18], being the first reliable teleportation experiment. 

Much more insight is needed into quantum teleportation and quite a bit of 
work has been done in this area. For example, Pati [41] have shown that 1 bit 
is sufficient to “teleport” an known quantum state using an EPR-channels 
and 1.19 bits are needed in average in the case Alice and Bob share local 
hidden variables. Hardy [23] challenges a widespread belief that teleportation 
is a non-local effect by constructing a local theory in which cloning is not 
possible, but teleportation is. Deutsch and Hayden [16] also see quantum 
teleportation as a local phenomenon.) 
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2. Theoretical studies of models and computational modes corresponding to 
particular technologies. For example of ways to establish an initial state, to 
perform measurements (readouts), and so on. 



4.2 Quantum Cryptography 

The main goal in this area is to come from the current experimental stage to 
a development stage and to practically useful quantum cryptographic systems. 
However, there may still be a long way to do so. 

If judging solely by the distance for which “successful” photons transmissions 
experiments have been performed, then the progress has been remarkable: from 
32 cm in 1989 to the recent 47 km using optical fibres and 1 km (0.5 km) in 
open air at night (day) time -with 2 to 7 km experiments in open air and day 
time transmissions announced - see [21] for references. However, a closer look at 
other quantitative and qualitative characteristics of the experiments performed, 
reveal that there is still very much to improve to get a quality needed for getting 
to the development and application stages. 

Several major challenges in this area can be identified. 

1. Earth-to orbit experimental transmission of photons or other particles. (This 
would culminate the current proof-of-principle experimental stage in quan- 
tum cryptography.) 

2. To study limitations of the real quantum transmissions and to explore their 
quantitative and qualitative characterists (speed, error rate, . . . ). 

3. To explore theoretically and experimentally tools that allow transmission 
of quantum information for long distances and/or long time periods. For 
example, quantum repeaters, concatenated codes and so on. 

Several other challenges deal with the unconditional security of quantum 
cryptographic protocols -actually the key problem of quantum cryptography. 

4. An identification and a study of the main cryptographic attacks. It is the 
existence of sophisticated quantum attacks what makes the proof of uncon- 
ditional security of quantum key generation protocols so difficult. In order to 
see the dimension of the problem let us list main types of quantum attacks 
an eavesdropper can perform. 

(a) Intercept-res end attacks. The eavesdropper measures each signal sent and 
then resends it. 

(b) Individual particles attacks. The eavesdropper tries to learn as much 
as possible from each signal sent by first contacting the signal with an 
ancilla and then performing a measurement on ancilla. 

(c) Collective attacks. The eavesdropper brings all transmitted particles into 
interaction with an ancilla and at the end measures all ancillas together 
to extract some information about the key. 

(d) Coherent or joint attacks. Instead of measuring the particles while they 
are in transit, the eavesdropper regards all transmitted particles as a sin- 
gle entity. She then couples this entity with an ancilla. Afterwards she 
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sends particles and keeps the ancilla. After the public interactions the 
eavesdropper extracts from ancilla some information about the key. 

Other types of attacks have been identified in the experimental cryptography: 

(a) Trojan horse attack. This refers to the possibility that an entanglement 
with environment can “open the door” and let a “Trojan horse to get 
in and to gather information about quantum communications in the 
cryptographic protocols” . 

(b) Beam splitter attacks and photon number splitter attacks (Bennett et 
ah, 1992, N. Liitkenhaus). This refers to the fact that it seems to 
be beyond current technology to produce perfect single photon pulses. 
An eavesdropper can therefore test whether there are more photons in 
a pulse and if this is the case then, using a beam-splitter, get one of 
them. Under some situations this can be utilise to get full information 
about the polarization of transmitted photons and about the key. 

5. Security of quantum key generation (QKG) protocols. Mayers and Yao [38] 
have shown unconditional security of the protocol BB84 under the assump- 
tion that the photon source is perfect. The proof is of the computational 
complexity type and quite complex and so are final technical statements. In 
addition, the proof does not provide a method how to calculate the error rate 
for real implementations. Much simpler is the recent proof by Lo [35] -see 
also the web updates to[21]. However, for a more sophisticated entanglement- 
based protocol in which the key generating parties need quantum computers. 
A search for new protocols and proofs of their unconditional security will 
surely continue. 

6. Security of cryptographic protocols. A big disappointment was a discovery by 
Lo and Chau, 1996 and Mayers in 1997, that unconditionally secure bit com- 
mitment protocols -an important primitive for quantum cryptographic pro- 
tocols -do not exist. The result implies impossibility of unconditional secu- 
rity for the ideal quantum coin-tossing protocols, and for l-out-of-2 quantum 
oblivious protocols. Unconditional security was shown only for a very sim- 
ple quantum gambling protocol [19]. An important open problem is whether 
there is an unconditional secure cryptographic protocol for unideal quantum 
coin-cossing problem (see the web update to [21] for more details). 



4.3 Quantum Entanglement 

Almost all deeply quantum problems in QIP deal, in some way, with quan- 
tum entanglement that is considered to be the most precious quantum resource. 
A better understanding of the entanglement, of ways it is created, stored, manip- 
ulated (transformed from one form to another), transmitted (even teleported) 
and consumed (to do useful work), as well as of various types and measures of 
entanglement, are theoretically perhaps the most basic tasks of the current QIP 
research, which could also be characterized as being in the stage of gathering 
phenomenology. Let us summarize main challenges in this area. 
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1. How to determine (in a sujfieiently easy way) whether a given state is entan- 
gled. For pure states this is easy, but for density matrices the problem is not 
solved yet. A well known is simple Peres’ [42] necessary partial transpose eon- 
dition which has been shown to be also a sufficient condition, by Horodecki 
et al. [28] for the case of 2 x 2 and 2x3 compound systems. To solve the 
problem in a general case is a hot current research topic. 

2. Measures of entanglement. Some quantum states, say ^(|00) + |H)), are 

clearly more (maximally) entangled, than others, say :^|00) + 
k ^ 2. The amount of entanglement in a maximally entangled state of two 
qubits is said to be one ebit.^ For pure quantum bipartite states = 
a\00)^VT — (a^|ll), the entropy of the entanglement 

-E'(IV’a)) = ^go? - (1 - o?) lg(l - a^), 

is a good measure of entanglement.^^ For mixed states (density matrices) it 
seems that there is no single measure of entanglement that is the best. 
Several measures of entanglement of bipartite states have been introduced 
and investigated (as well as relations among them) -see [21] for a overview. 
For example: (a) entanglement of formation, Ef{p), the least expected entan- 
glement of any ensemble of pure states with p as its density matrix; (b) entan- 
glement of distillation, Ed{p), is the maximum yield of singleton states that 
can be produced from the given mixed state by local operations and classi- 

cal communication; (3) total entanglement, Ep{p) = lim^^oo called 

also (a regularized version of the) “entanglement of formation” . ByHorodecki 
et al. [26] Ed{p) < E{p) < Ep{p) for any “good” measure of entanglement.. 

3. Creation of pure (maximally) entangled states. One can consider either 
a physical source to produce a (mixed) entangled state, or ways to pro- 
duce entangled states form unentangled ones (i.e. XOR^^(|0) + |1))|0)^ = 
^(|00) + |11)), or procedures that transform noisy entangled states into pure 
entangled states (entanglement purification, or distillation) as well as proce- 
dures that produce maximally entangled states from weakly entangled states 
(entanglement concentration). All these cases have been investigated, but 
much more research is needed to get efficient techniques. In some important 
cases, say for quantum teleportation, or quantum cryptography, maximally 
entangled states are needed. Entanglement purification techniques belong to 
the most important ones in the area of entanglement manipulation. 

4. Operations with entangled states. Two types of operations are of interests: 
(1) operations that do not change the amount of entanglement and only 
change it from one form to another; (2) operations that change the amount 

^ It is also said that two particles in a maximally entangled state form an ebit 
Entropy of the entanglement provides also an elegant characterization of the poten- 
tial to create n singletons from k states \fjcx) using only local operations. Indeed, 
as shown by Bennett, Bernstein, Popescu and Schumacher in 1996, E(|'0a)) = 

lim H 

iAmn,fc— >oo ^ • 



Quantum Challenges 15 



of entanglement. From the operations of the second type let us mention 
entanglement splitting [9] - how and when two parties sharing an entangled 
state provide a third party with some of their entanglement - and related 
distributed entanglement, see [15] entanglement swapping and so on. 

Of a special interest has been the problem in which case one can transform 
one entangled state, say |0), into another one, say lip), by local operations 
and classical communication. The problem has been solved by Nielsen [40] 
who found a simple necessary and sufficient condition. 

5. Types of entanglement and the strueture of the spaee of entangled states. The 
research in this area seems to be at the very beginning and one can expect 
a lot to come. An important and surprising discovery along these lines has 
been made by Horodecki et al. [28]. They should that there exists a /ree, 
distilable, entanglement, and a bounds nondist liable, entanglement. 

6. Entanglement as a eomputational and eommunieational resouree. Such a role 
of entanglement is often much emphasized. However, much more research 
seems to be needed to get a better understanding what such a general the- 
sis really means. For example, Lo and Popescu [37] explored a question 
whether one needs classical communication for such entanglement manipu- 
lations as entanglement concentration (a transformation of arbitrary states 
into singlets) and entanglement dilution (transformation of n singletons into 
a given bipartite state) and showed that these operations practically do not 
need classical communication (for concentration) or the amount of classical 
communication converges to 0 with n going to infinity (for dilution). 

7. Entanglement as a eomputation and eommunieation primitive. Gottesman 
and Chuang [20] have shown that by teleporting qubits through special 
entangled states one can create a variety of interesting gates and that allowed 
to establish that the GHZ-state, quantum measurement and one-qubit gates 
form a universal set of quantum computation primitives. 

8. Simulation of entanglement by elassieal eommunieation. If the measurements 
of two particles in an entangled state are space-separated, then, of course, 
there is no way to simulate them classically. However, if measurement events 
are time-separated, then such a simulation is possible andBrassard et al. [7] 
establish a cost of such a simulation for the case of the system in n Bell 
states. 

9. Multi- partiele entanglement. Understanding, characterization, classification 
and quantification, as well as manipulation of multi-particle entangled states, 
is a very important and difficult problem. Perhaps the main current challenge 
of quantum information theory, that needs to be explored in order to get 
more insights into the possibilities of distributed quantum computing and 
networks. The problem is also of interest outside of QIP, as discussed in 
Section 4.13. 

Error correcting codes are an important example of an area where multi- 
particle entanglement plays a crucial role and where quite a few insights 
have been obtained about it. 
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10. Analogies and principles. For the overall development in the field of QIP it is 
also of importance to try to discover analogies and general principles that are 
valid but cannot be derived directly from the current quantum mechanics. 
As an example, let us mention attempts to develop an analogy between the 
role information plays in physics and that energy plays in thermodynamics, 
as well as such principles as no-increasing of entanglement principle which 
says that entanglement of a compound quantum system does not increase 
under unitary processes in one of the subsystems [27]. 

4.4 Decoherence 

The understanding and managing of decoherence is of crucial importance for 
experimental QIP and for the design of quantum processors. However, decoher- 
ence is also a crucial theoretical problem. 

The view of decoherence still much differ. From more pragmatic ones to highly 
theoretical ones. For example, that “decoherence is connected with the storage 
of information about the decohering system somewhere in the universe” and 
“decoherence is a process of coupling of a quantum system with its environment” 
or “decoherence can be regarded as measurements of the quantum system by the 
environment” . 

In spite of an intensive research in this area there seems to be visible only 
a moderate progress. To find ways to cope with decoherence is often considered 
as the task for physicists and experimental QIP. However, the field seems to need 
brand new ideas and it seems that it is one of the areas (theoretical) computer 
scientists should enter. 

Let us identified some subareas on which a larger attention of researcher 
should focus. 

1. An understanding of the decoherence. A realistic investigation of the sources 
of decoherence and of the forms decoherence exhibits itself, as well as of 
impacts it can cause. 

2. The development of tools to fight decoherence. Two main types of such 
tools have been identify: (a) active, for example error correcting codes -they 
fight decoherence by repeated applications of error correction procedures; 
(b) passive -for example decoherence free subspaces (DFS) - in this case the 
structure of the particular decoherence processes is used to protect coher- 
ence and the symmetries in the decoherence process are utilized. (DFS seem 
to be ideal for quantum memory applications, but for quantum computing 
applications they should be combined with QECC (even if DFS can be seen 
as a special type of QECC with very simple recovery operators [1]. 

3. Making a positive use of decoherence. It becomes a wisdom in QIP that 
everything bad is good for something. Let us therefore try to make a positive 
use of decoherence. By Knight et al. [30] decay can lead to entanglement of 
quantum systems and detection of decay can be used as a method of state 
preparation (and can also allow a wide range of communicational tasks - 
even entanglement). 
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4.5 Managing Decoherence and Imperfections 

One area where unquestionably important progress has been made, with poten- 
tially broad impacts also outside of the quantum information processing appli- 
cations, and where new ideas and large progress are expected, and much needed, 
concerns the problem how to manage imperfections of quantum processes due to 
the decoherence, decay, noise and imprecisions - how to protect quantum states 
from interactions with the environment. 

Imperfections of quantum processes and their coupling with environment, 
that make quantum coherence so fragile, are the main argument of quantum 
pessimists against quantum computing and main worries of quantum computer 
design optimists. Impossibility to use the majority voting techniques, so useful 
in the classical case, due to the no-cloning theorem, and other special properties 
of quantum states and processes, created a belief that error correcting codes, so 
useful in the classical case, are impossible in the quantum case. 

Shor, in 1995, and a bit later also Steane, found ways how to use quantum 
entanglement to design quantum error correcting codes (QECC). (QECC also 
allow to keep unchanged un unknown quantum state in a noisy environment - 
i.e. to create a long-term quantum memory.) Shortly after, by Shor, quantum 
fault-tolerant computation was shown to be potentially possible. (Its existence 
is an important fact because the main real problem with error correction are 
potential new errors introduced by the error correction procedures themselves.) 

In order to deal with imperfections at quantum processes the following tech- 
niques have been invented and are expected to be further developed. 

1. Quantum error correcting codes. In order to be able to correct t (indepen- 
dent) errors on real qubits, the basic idea is to encoded the basis states, |0) 
and |1), by entangled states (called logical qubits) of n qubits such that no 
information resides in any subset of 2t qubits, i.e. the density matrices of 
any 2t qubits are random. Therefore information can be encodded using n 
qubits, but encoded information has a “global character”. There is no way 
to access any information just by measuring only few qubits. 

A variety of quantum error correcting codes and methods to design codes 
have already been developed (see [21], for a presentation and references). 
Progress in this area has been due to the following factors: (1) discretization 
of errors has been shown to be possible; (2) error correction principles have 
been discovered; (3) methods to design codes have been found. 

The area seems to be getting firmly into the hands of error-correcting spe- 
cialists. Progress is expected to come from the following directions: (1) deep 
techniques of the classical error correcting codes will be adopted to the quan- 
tum case; (2) QECC will be developed for specific quantum processes, with 
specific types and patterns of errors, making use of specific physical prop- 
erties of these processes; (3) QECC will be developed also for the case of 
collective errors and for quantum systems that are not decomposable into 
qubits [32] . 
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2. Development and exploration of other methods of fighting decoherence as 
entanglement purification, symmetric subspaces, decoherence-free subspaces, 
making use of the quantum Zeno effect (QZE) and so ond^ 

3. Development of quantum error prevention and detection techniques. 

4. Quantum fault-tolerant computations and gates. Two basic approaches are: 

(a) An algorithmic approach. To design methods to perform fault-tolerant 
computations on logical qubits to simulate, in a quantum fault-tolerant 
way, quantum gates. Shor showed in 1996, that quantum fault-tolerant 
computation is possible and showed a way to make fault-tolerant imple- 
mentation of a set of universal gates. (His results are theoretically impor- 
tant and encouraging, but suggested constructions are two complex for 
applications.) Since then significant progress in this area has been made 
(and is still needed). See, for example, much simpler set of universal 
fault-tolerant gates, due to Boykin et al. [6] and a method to design 
fault-tolerant gates due to Gottesman and Chuang [20] . 

(b) A hardware approach. To develop physically inherently fault-tolerant 
gates. Kitaev has made such a proposal in 1997. However, the point is 
to come with an idea implementable with the current technology. 

4.6 Quantum Measurement 

This is, perhaps, after quantum non- locality, the most controversial problem 
of quantum mechanics, studied and discussed for more than 60 years, without 
a really breaking success. 

The problem of measurement has already been studied from many points of 
view, even on the level of the philosophy of sciences. Experimental physicist did 
not really run into difficulties with the Copenhagen interpretation of quantum 
measurement that sees the projection measurement as an external intervention 
into quantum evolution, but leaves open some very basic questions how measure- 
ments are performed in Nature and how much resources they need. Quantum 
information processing needs require to deal with this problem from new points 
of view and with new aims, that may be beneficial for a better understanding of 
quantum measurement. Let us summarize some of the new questions. 

— How much resources quantum measurements really need? This question has 
been left practically intact till now. Actually, there has not been so much 
a real need to answer it. However, now, when one searches for quantitative 
statements concerning the power and limitations of quantum information 
processing, this question should not be left open. The answer to this prob- 
lem can be of importance for the overall view of the potentials of QIP. In 
principle, it could happen that resources required grow up exponentially with 
the number of qubits needed. This may have significant impact on quantum 
computational complexity aims, methods and results. 

QZE is the name for the phenomenon of freezing (slowing down) the quantum evo- 
lution of a frequently (continuously) measured quantum system. 
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— How important computational primitive quantum measurements actually 
are? As observed by Castagnoli et al. [12], by performing a measurement one 
can get, in one computational step and without apparently “any resources”, 
incorporated into the resulting projection, a solution of a complicated sys- 
tem of constrains (inequalities). A projection measurement can therefore 
be a powerful computational primitive and this is actually essentially made 
used in some efficient quantum algorithms. With this in mind the question 
of the amount of the resources quantum measurements need is getting a new 
dimension and urgency. 

— Which quantum observables are (currently) implementable and what kind 
of resources they need. Quantum theory allows us to use a rich variety of 
observables. Far less is known about a proper use of them. In addition, it 
is not clear which ones are (currently) implementable, and with which tech- 
nology. For example, there are still hardly good ways known to distinguish 
faithfully four Bell states by one observable. 

Remark. As pointed out by Reif [45] , quantum computing research has not been 
paying attention to one very important issue. Namely, to the overall amount 
of such resources as energy and volume quantum computation requires. The 
problem is that, at least hypothetically, the need for these resources could grow 
up so fast (exponentially) with the number of qubits involved, that this could 
discard all advantages quantum superpositions, parallelism and entanglement 
provide. The key problem here are resources measurements can need. 

Since quantum evolution is unitary, and consequently reversible, we can 
assume that the amount of energy it needs is negligible. On the other side, 
the problem how much energy and volume quantum measurements need is open 
and seems to be very hard. 

There are several fundamental issues involved. The nature of quantum mea- 
surement and the amount and types of quantum measurements one really needs. 

On the foundational side, this is an opportunity to realize importance of the 
fundamental difference between two basic approaches to quantum measurement. 

The Copenhagen interpretation that assumes that a quantum measurement 
is a basic operation which is assumed to be performed by a macroscopic device 
(of the classical world). 

The von Neumann interpretation that considers the measured object (state) 
and also the measuring device as parts of a larger quantum system. A measuring 
device and the measurement process themselves are then seen as being done by 
microscopic systems and subjected to quantum effects. 

For experimental physics, where quantum measurements are performed by 
macroscopic devices, the von Neumann interpretation of quantum measurements 
has not been really relevant. 

A different story may be in the case of quantum information processing in 
the case of larger amount of qubits. In this case the von Neumann interpretation 
can be both of interest and importance for at least two reasons. 
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1. As pointed out by Hay and Peres [24], predictions provided by these two two 
interpretations can be different 

2. It is possible that the volume for quantum observation/measurement grows 
very fast with the number of qubits involved. 

In any case, (good) bounds are needed for total energy consumption and total 
volume for quantum measuring devices and measurements. 

Reif [45] provides arguments, but not a proof, that even attempts to do 
only “approximate quantum measurements” , that may be sufficient for quantum 
computations, may require exponential resources. 

A related hard problem is that of the amount of measurement operations 
we really need for various computational tasks. For some models, say one-way 
quantum finite automata, there is an essential difference concerning the recog- 
nition power between the models requiring measurements after each operation 
and the model with only one measurement, at the end of each computation. 

As pointed out be Bernstein in 1997, in the case of quantum Turing machines, 
one can postpone measurements to the end of computation, using a special XOR- 
ing trick. However, the problem needs to be more investigated. 

It is also natural to ask an extreme question whether we need measurements 
at all. Whether and when it is sufficient to consider as the output the final 
quantum state. For a more detailed discussion on this subject see [45]. 



4.7 Quantum Information Theory 

There is a belief that quantum information theory could develop to a theory that 
would be used to explain and interpret all quantum phenomena. The challenge 
is, if possible, to do so. 

In a broad sense, such areas as QECC and entanglement are parts of quantum 
information theory. From the other major areas let us mention: 

1. An abstraction and study of the main quantum information processing prim- 
itives and relations among them. 

2. The development of the basic concepts of quantum information theory that 
would parallel and generalize those in the classical information theory. 

3. The development of a quantum analogue of the algorithmic information the- 
ory of Chaitin. The basic step here is to find quantum analogues to such 
concepts as Chaitin (or self-delimiting Kolmogorov) complexity. The first 
more involved attempt to do so has been due to Vitanyi [47]. He has intro- 
duced two variants of quantum Chaitin complexity of pure quantum states. 
One is through the length of the shortest classical program generating or 
approximating a given pure state. The second approach is to consider the 
length of the shortest quantum qubit program generating or approximating 
a given pure quantum state. Quantum laws, especially no-cloning theorem, 
make the development of the corresponding theory not easy. Additional com- 
plications come from the fact that the number of possible quantum Turing 
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machines as well as of quantum states is uncountable. A fixed universal quan- 
tum Turing machines cannot therefore generate all pure states -some have 
to be approximated. 

4. An understanding of quantum channels and their capacity to transfer classi- 
cal and quantum information [3] . In order to communicate through quantum 
channels three communication primitives can be used: bit, qubit and entan- 
gled states (ebits). A variety of the concepts of quantum channel capacities 
have been identified and investigated. In the case that quantum states are 
transmitted through a quantum channel and in order to do that classical 
communication between the sender and the receiver in both ways is pos- 
sible, then we speak about the assisted quantum capacity^ Q 2 . In the case 
quantum states are transmitted, but no classical communication is allowed, 
we speak about the quantum capacity^ Q. Finally, in the case that classical 
information is transmitted, we speak about classical capacity^ C. In the last 
case there are still four special subcases to consider, depending on whether 
encoders and decoders are classical or quantum. 

A deeper study of these capacities in general and for particular quantum 
channels has still to be done. One of the open problems is whether Q 2 > C 
for some channels. 

The quality of quantum transmissions is measured by the fidelity -di simi- 
larity between input and output. In the case a pure (mixed) state \fi) {p) 
is transmitted with probability pi to the mixed state Wi^ the fidelity of the 
transmission is defined by Yli^iPi{i^i\wi\fii) 

5. The study of the data compression methods and limitations. Schumacher’s 
quantum data compression theorem showed achievable theoretical limits for 
quantum data compression. The problem of universal, efficient and sim- 
ple compression methods has been solved to some degree. The subject of 
quantum data compression is, however, of such importance that much more 
research into the subject is needed. 



4.8 Bypassing Quantum Limitations 

Physical limitations cannot be really fully bypassed, but there is a strong need 
to explore them more carefully from the information processing point of view. 
What they really say, what is the whole truth about them and how important 
they really are, especially for quantum information processing. 

Let us discuss from this point of view some of the limitations. 

1. No-cloning limitation. Perfect cloning is impossible. However, how about 
an imperfect one? Imperfect copies can be perhaps useful in some cases. 
How many and how good copies we can make using some (universal) gates? 

Observe that unknown quantum state cannot be copied and therefore a compression 
of unknown quantum states has to be made “blindly”, without learning these states. 
It is remarkable that quantum compression of known quantum states cannot be done 
more efficiently than that of unknown states, due to Schumacher’s theorem. 
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What kind of properties they have? Buzek et ah [10] have investigated such 
problems. 

How much is cloning actually needed and in which cases it can be bypassed? 
For example, there was quite a strong belief till 1995 that quantum error 
correcting codes are impossible and no-cloning theorem was used as one of 
the main arguments. 

2. No information gain without disturbance principle. There is a large need 
to get a better insight concerning the dependences between information 
obtained from a quantum state and its disturbance. This is of crucial impor- 
tance for a deeper study of the security of quantum cryptography systems 
and protocols. 

3. Irreversibility of quantum measurements principle. In some special cases we 
can, in reality, observe that quantum measurements are actually reversible. 
For example, in the case of teleportation, when they even “do useful work” , 
to transform quantum information from one place to another, almost. The 
cases when quantum measurements are reversible need to be more explored. 

4.9 Quantum Automata Theory 

Two problems seem to be of the main importance here: 

1. To develop quantum variants of the main classical models of quantum 
automata: finite automata, cellular automata, Turing machines and so on. 
On one side, one has to study quantum models per se and on the other side 
also their relations to their classical counterparts - for example, with respect 
to their recognition and descriptional power (see Gruska, 1999a). 

2. To develop inherently quantum models of automata. 

4.10 Quantum Algorithms and Communication Protocols 

After the apt killing factorization and discrete logarithm algorithms of Shor and 
an influential algorithm of Grover, the progress in the area of the designing of 
quantum algorithms and networks has been steady, but not really spectacular, 
as was hoped for. At the same time, for the whole development of quantum infor- 
mation processing technology, it is of upmost importance to see clearly benefits 
quantum computers could provide. Much more insight is therefore needed into 
the real potentials of quantum computing and communication. 

There have been several attempts to develop methodologies to design quan- 
tum algorithms, by amplitude amplification, see [21] for a review. 

Recently, the so-called query complexity, at which one asks for the minimal 
number of queries to an oracle, has emerged as an important part of quantum 
complexity theory [14,21]. In addition, interesting relations between computa- 
tional complexity, query complexity and communication complexity has been 
discovered - see [14] for an overview. 

The emphasis seem also be shifting from the investigation of the computa- 
tional problem to the investigation of the communicational problems, where one 
can make use of the entanglement in a much more straightforward way. 
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It has been shown in the area of quantum communication complexity that 
there are communication tasks for which quantum communication complexity 
provides exponentially better result than classical randomized communication 
complexity [44]. 

Let us now present some particular challenges. 

1. Is there a polynomial time algorithm for the hidden subgroup problem for 
non-commutative groups?^^ 

2. A search for zero-error quantum algorithms. Many polynomial time quantum 
algorithms provide their results with (small) errors. Can we find for the same 
problems also zero-error and still polynomial quantum algorithms? There is 
already a variety results in this area. 

3. New basic quantum transforms need to be developed and investigated as 
tools to design quantum algorithms. 

4. Quantum distributed computing need to be explored. In this setting it seems 
to be a good chance to make a significant use of entanglement. 

Design of spatially distributed quantum networks and development of quan- 
tum communication and cooperation protocols for distributed quantum com- 
puting is one of the big challenges for quantum algorithms/protocols com- 
putation/communication theory and practice. 

The basic idea is at first to distribute entanglement among distance nodes 
of the network and then to make use of the entanglement to make commu- 
nication and computation more efficient than the classical one. 

Some of the very basic problems behind this approach have already been 
solved, at least theoretically, and various ways are being explored to have 

Hidden subgroup problem 

Given: An (efficiently computable) function / : G ^ R, where G is a finite group 
and R a hnite set. 

Promise: There exists a subgroup Go < G such that / is constant and distinct 
on the cosets of Gq. 

Task: Find a generating set for Gq (in polynomial (in Ig |G|) number of calls to 
the oracle for / and in the overall polynomial time). 

This problem is a natural generalization of several problems that have been inves- 
tigated in quantum computing. For example, the Simon problem, the integer factor- 
ization problem and the discrete logarithm problem. 

(a) Order-finding problem, G = Z, a G N, f{x) = , x — y e Go f{x) = f{y), 

Go = r/c I /c G Z for the smallest r such that = 1. Find r. 

(b) Discrete logarithm problem, G = Z^ x Zr, a’’ = 1 , a ,5 G N, f{x,y) = 

a^b^, f{xi,yi) = f{x2,y2) {xi,yi) - (x2,y2) G Go. Go = {{k, -km) \ k G Z^}. 
Find Go (or m). 

It has been shown that there is a polynomial time algorithm for the Hidden 
subgroup problem for the case of Abelian groups. The non-commutative case is of 
interest not only as a natural generalization, but also because the graph isomorphism 
problem falls into this category. The existence of polynomial time algorithms has 
been shown already for some non-commutative groups (see [21] for an overview and 
references), but the general case remains to be open. 
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efficient implementations. Quantum teleportation techniques, in combina- 
tion with quantum purification and distillation techniques, allow, in prin- 
ciple, distribution of the entanglement through noisy channels. The range 
of quantum communication can be extended to very distant nodes using 
“quantum repeaters” that make use of quantum entanglement and quantum 
error correcting techniques to restore quantum signals without reading quan- 
tum information and in this way to bypass limitations imposed by quantum 
no-cloning theorem. 

4.11 Foundational Issues 

1. Non-locality is, naturally, one of the key topics in the area of very fundamen- 
tal issues. There are many fundamental questions related to this problem. 
For example, to which extend and in which sense is quantum information 
global? It is therefore natural that this subject keep attracting attention 
of those not very happy with the current state of the understanding and 
of those with a hope of a fundamental contributions. Einstein’s criticism of 
current quantum theory is still not without admirers and followers and even 
if the hidden variable approach did not succeed a search for other variants 
have not stopped. See, for example, the recent paper by Deutsch and Hay- 
den [16]. Concerning the other recent results, let us mention the discovery 
of non-locality without entanglement [4]. 

2. A search for “new physics” for a different theories of physical world - 
especially those approaches motivated by QIP problems, paradigms and 
insights. 

3. New QIP paradigms. This is a natural and long standing goal. Recently 
developed idea of quantum computation with continuous variables (or in 
infinite dimensional Hilbert spaces) falls into this framework. 

4.12 Quantum Paradoxical Phenomena 

There is a variety of fascinating paradoxical quantum phenomena that need to 
be studied in depth. Quantum non- locality is one of them. There are few others, 
worth to investigate. 

1. Quantum teleportation. Though mathematically simple, quantum telepor- 
tation is a puzzling phenomenon. How is it possible that we can “instanta- 
neously”, within two bits, transfer an unknown quantum state? (An inter- 
esting explanation, in terms of the many world interpretation, is given by 
Vaidman [46]: He takes the position that teleportation procedure does not 
move the unknown quantum state. The state was at the Bob’s place from 
the very beginning. The local measurement performed by Alice only splits 
the world in such a way that in each of the new worlds the state of Bob’s 
particle differs from the unknown state only by one of the well known trans- 
formations, and the number of these transformations happens to be small, 
just 4.) 
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2. Counterfactual computations. They are processes by which one can learn 
the results of the computation without actually performing the computa- 
tion-provided the possibility to perform that computation is available, even 
though computation itself has not been performed. (Potentials and limi- 
tations of the counterfactual computations have been recently studied by 
Mitchison and Jozsa [39].) 

4.13 Exploring Relations and Impacts to Other Sciences 

Even if the design of quantum computation and communication systems is a very 
important goal of the research in the area of QIP, it is not a single one. Of 
importance is also to identify areas outside of QIP where results of QIP theory 
could help to solve major problems and to make significant contributions to other 
sciences, and then to concentrate also on the development of the corresponding 
areas of QIP. Let us mention some of such problems. 

1. Simulating quantum physics. Even a 10-20 qubit processor, what seems to 
be feasible, should be a powerful tool to simulate quantum phenomena what 
could be used by scientists in various areas to perform experiments. Eor 
example, in nuclear physics, material sciences, molecular biology, . . . 

2. A search for interfaces between physics and (quantum) information (pro- 
cessing). An understanding seems to be emerging that some problems of 
physics and physical phenomena can be elucidated by taking QIP points 
of view, paradigms, concepts and methods. Especially phenomena exhibit- 
ing multi-particle entanglement. Moreover, in terms of quantum information 
some important principles have been already formulated. Eor example, the 
“holographic principle” saying that quantum information encoded in a spa- 
tial volume can be read completely on the surface that bounds the volume. 
By Preskill [43], “The holographic viewpoint is particularly powerful in the 
case of the quantum behaviour of a black hole. The information that disap- 
pears behind the event horizon can be completely encoded on the horizon, 
and so can be transferred to the outgoing Hawking radiation that is emitted 
as the black hole evaporates. This way, the evaporation process need not 
destroy any quantum information.” 

3. Impacts on physics (theoretical and experimental). 

(a) Deepening of our understanding of quantum mechanics (especially 
through enhancement of our knowledge on such issues as non-locality, 
measurements, entanglement, decoherence and so on). 

(b) Improvement of information gathering capabilities of physical experi- 
ments [43,13]. They expect that progress in QIP can bring new strategies 
to perform high precision measurements (as those needed for detection 
of gravitational waves and other weak forces). Progress in QIP could 
help, for example, to develop a theory of distinguishability of superop- 
erators (a measure of the amount of information needed to extract in 
order to be able to distinguished two superoperators, what could pro- 
vide limitations on precision measurements), and for that they envision 
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and analyse utilization of even such QIP tools and methods as quantum 
Fourier transform, Grover’s search method and superdense coding. 

In general, one can expect that the ideas emerging from the QIP theory 
can have impact on experimental physics. Especially the understanding 
and utilization of the entanglement can influence experimental methods, 
(c) An understanding of the behaviour of strongly-coupled many-body sys- 
tems -one of the most challenging problems in quantum dynamics [43]. 
This can be helped by a deeper study of multi-particle entanglement. 

4. Determination of the ultimate limitations for computation. This is a chal- 
lenging task -both for computing and physics. As analysed by Lloyd [34], the 
speed with which a physical device can process information is limited by its 
energy (needed to switch from one state to another, distinguishable, state), 
and therefore on the basis of quantum mechanics one can determine that for 
a computer of 1 kg of mass and of 1 1 of volume, the maximal ultimate speed 
is 2.7 X 10^^ bit operations per second. It seems harder to determine the num- 
ber of bits such a “ultimate laptop” can store in general (entropy limits the 
amount of information a computer can store), but it can be determined for 
a computer that has been compressed to form a black hole (to be 3.8 x 10^^ 
bits). Estimations are based on such constants of physics as the speed of 
light, the Planck constant and the gravitation constant -and do not make 
any computer architecture assumption. (The question how black holes deal 
with information is an interesting problem of current physics - whether they 
destroy information or not -see also the holographic principle. It seems that 
new physics is needed to understand these problems.) The need to deal with 
errors can be an important factor why the above limits, based on only such 
characteristics of a computer as mass and volume, can be hard to achieve. 
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Abstract. To specify the set of tractable (practically solvable) comput- 
ing problems is one of the few main research tasks of theoretical com- 
puter science. Because of this the investigation of the possibility or the 
impossibility to efficiently compute approximations of hard optimiza- 
tion problems becomes one of the central and most fruitful areas of 
recent algorithm and complexity theory. The current point of view is 
that optimization problems are considered to be tractable if there exist 
polynomial-time randomized approximation algorithms that solve them 
with a reasonable approximation ratio. If a optimization problem does 
not admit such a polynomial-time algorithm, then the problem is con- 
sidered to be not tractable. 

The main goal of this paper is to relativize this specification of tractabil- 
ity. The main reason for this attempt is that we consider the requirement 
for the tractability to be strong because of the definition of the com- 
plexity as the “worst-case” complexity. This definition is also related to 
the approximation ratio of approximation algorithms and then an opti- 
mization problem is considered to be intractable because some subset of 
problem instances is hard. But in the practice we often have the situ- 
ation that the hard problem instances do not occur. The general idea 
of this paper is to try to partition the set of all problem instances of 
a hard optimization problem into a (possibly infinite) spectrum of sub- 
classes according to their polynomial-time approximability. Searching for 
a method enabling such a fine problem analysis (classification of prob- 
lem instances) we introduce the concept of stability of approximation. 
To show that the application of this concept may lead to a “fine” char- 
acterization of the hardness of particular problem instances we consider 
the traveling salesperson problem and the knapsack problem. 



1 Introduction 

Historically, the first informally formulated questions about computability and 
decidability dates back at the turn of the century, when the great mathematican 
David Hilbert asked for the possibility to encode all mathematical problems in 
some suitable formalism and to determine (decide) their truth by an algorithm. 
The landmark paper by Kurt Godel [11] showed that this is impossible. Moreover, 
this paper led to the formal definition of the notion of algorithm and so put the 
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fundamentals of theoretical computer science. So, it was possible to formaly pose 
questions of the following kind: 

a given problem algorithmieally solvable (eomputable, deeidable)?’^ 

This led to the development of the computability theory (for a nice overview 
see [22]), where mathematical methods determining the existence or non- 
existence of algorithms for different mathematical problems were created. With 
the development of computer technologies a new, more involved question than 
the original one becomes the central one: 

^7/ a problem is eomputable, how mueh physieal (eomputer) work is 
neeessary and sujfieient to algorithmieally solve the problem?’’ 

This question initialized the origin of the algorithm and complexity theory in 
the sixties. Measuring the complexity of algorithms the question “What is com- 
putable?” was replaced by the question 

‘What is praetieally eomputable?” 

But to answer this question one has first to specify (formalize) the meaning of the 
notion “praetieally eomputable”, called tractable in the current literature. The 
history of the development of complexity and algorithm theory can be more or 
less viewed as the history of the development of our opinion on the specification 
of tractability. 

The first specification was done already in the sixties, when people decided 
to consider a computing problem to be tractable if there exists an algorithm 
that solves the problem in polynomial-time in the input length. A problem is 
called intractable if it does not admit any polynomial-time algorithm. The main 
reasons for this decision were the following two: 

(i) (a more theoretieal reason) The class of polynomial-time computations is 
robust in that sense that its specification is independent on the comput- 
ing model (formalism) chosen, if the model is reasonable for the complexity 
measurement.^ 

(ii) (a more praetieal reason) If a problem had a polynomial-time algorithm, 
then usually the people were able to find an at most 0{n^) time algorithm 
for it. Such an algorithm is really practical because one can use it to solve 
problem instances of sizes that realistically appear in the practice. On the 
other hand, if the complexity of an algorithm is 2^, where n is the input size, 
then already for realistic input size n = 100 2^^ is a 31-digit number. Since 
the number of microseconds since the “Big Bang” should has 24 digits, it is 
clear that there is no chance to apply this algorithm. 

Considering this classification (specification of tractability) the main problem 
becomes to find a method for proving that a computing problem does not admit 

^ Note, that this kind of robustness is a reasonable requirement that should be fulfilled 
for every attempt to define the class of tractable problems. 
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any polynomial-time algorithm, i.e., that it is intractable. Because no mathe- 
matical proof of intractability for a language from NP has been realized up till 
now, and the most interesting problems in practice have their decision versions 
in NP, Cook introduced the concept of NP-completeness in his seminal paper [9] . 
This concept enabled, under the assumpion P ^ NP, to prove intractability of 
thousands of computing problems. 

Immediately after introducing NP-hardness (completeness) [9] as a concept 
for proving intractability of computing problems [16], the following question has 
been posed: 

an optimization problem does not admit any effieiently eomputable 
optimal solution, is there a possibility to efficiently compute at least an 
approximation of the optimal solution?” 

Several researchers (see, for instance [15,17,7,14]) provided already in the middle 
of the seventies a positive answer for some optimization problems. It may seem 
to be a fascinating effect if one jumps from the exponential complexity (a huge 
inevitable amount of physical work) to the polynomial complexity (tractable 
amount of physical work) due to a small change in the requirement - instead of an 
exact optimal solution one forces a solution whose quality differs from the quality 
of an optimal solution at most by 5 • 100 % for some 5. This effect is very strong, 
especially, if one considers problems for which this approximation concept works 
for any small relative difference 5 (see the concept of approximation schemes 
in [14,19,21,4]). 

On the other hand we know several computing problems that admit efficient 
polynomial-time randomized algorithms, but no polynomial-time deterministic 
algorithm is known for them.^ Usually the randomized algorithms are practical 
because the probability to compute the right output can be increased to 1 — ^ 
for any small (5 > 0. 

These are the reasons why currently optimization problems are considered 
to be tractable if there exist randomized polynomial-time approximation algo- 
rithms that solve them with a reasonable approximation ratio. In what follows 
an (^-approximation algorithm for a minimization [maximization] problem is an 
algorithm that provides feasible solutions whose cost divided by the cost of opti- 
mal solutions is at most a [is at least -]. 

L a -I 

There is also another possibility to jump from NP to P. Namely, to consider 
the subset of inputs with a special, nice property instead of the whole set of 
inputs for which the problem is well-defined. A nice example is the Traveling 
Salesperson Problem (TSP). TSP is not only NP-hard, but also the search of 
an approximation solution for TSP is NP-hard for every 5.^ But if one con- 
siders TSP for inputs satisfying the triangle inequality (so called Z\-TSP), one 
can even design an |-approximation algorithm [7]. The situation is still more 

^ Currently, we do not know any randomized bounded-error polynomial-time algo- 
rithm for an NP-hard problem. 

^ In fact, under the assumtion P ^ NP, there is no p(n)-approximation algorithm for 
TSP for any polynomial p. 
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interesting, if one considers the Euclidean TSP, where the distances between the 
nodes correspond to the distances in the Euclidean metrics. The Euclidean TSP 
is NP-hard [ 20 ], but for every small 5 > 0 one can design a polynomial-time 
(1 + 5)-approximation algorithm [ 2 , 3 , 18 ].^ 

Exactly this second approach for jumping from NP to P is the reason why we 
propose again to revise the notion of tract ability of optimization problems. This 
approach shows that the main drawback of the current definition of tractabil- 
ity is in the definition of complexity and approximality in the worst-case man- 
ner. Because of some hard problem instances a problem is considered to be 
intractable. But if one can specify the border between the hard problem instances 
and the easy ones, and one can efficiently decide the membership of a given 
instance to one of this two classes, then this computing problem may finally be 
considered to be tractable in several applications. Our general idea is to try to 
split the set of all input sequences of the given problem into possible infinitely 
many subclasses according to the hardness of their polynomial-time approxima- 
bility. To provide a method that is able to achieve this goal, we introduce the 
concept of approximation stability. 

Informally, one can describe the idea of our concept by the following scenario. 
One has an optimization problem P for two sets of input instances Li and L2, 
Li C 1/2. Eor Li there exists a polynomial-time ^-approximation algorithm ^ 4 , 
but for L2 there is no polynomial-time 7-approximation algorithm for any 7 > 0 
(if NP is not equal to P). We pose the following question: Is the algorithm A 
really useful for inputs from Li only? Let us consider a distance measure M in 
1/2 determining the distance M{x) between Li and any x £ L2. Now, one can 
consider an input x G I/2 — Ti, with M{x) < k for some positive real k. One 
can look for how “good” the algorithm A is for the input x G L2 — L\. If for 
every /c > 0 and every x with the distance at most k to Li^ A computes an 
approximation of an optimal solution for x {Sa,k is considered to be a constant 
depending on k and a only), then one can say that A is “(approximation) stable” 
according to M. 

Obviously, the idea of the concept of approximation stability is similar to 
that of stability of numerical algorithms. But instead of observing the size of the 
change of the output value according to a small change of the input value, we 
look for the size of the change of the approximation ratio according to a small 
change in the specification (some parameters, characteristics) of the set of input 
instances considered. If the exchange of the approximation ratio is small for 
every small change in the specification of the set of input instances, then we 
have a stable algorithm. If a small change in the specification of the set of input 
instances causes an essential (including the size of the input instances) increase 
of the relative error, then the algorithm is unstable. 



^ Obviously, we know a lot if similar examples where with restricting the set of inputs 
one crosses the border between decidability and undecidability (Post Correspon- 
dence Problem) or the border between P and NP (SAT and 2-SAT, or vertex cover 
problem) . 
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The concept of stability enables to show positive results extending the appli- 
cability of known approximation algorithms. As we shall see later, the concept 
motivates us to modify an unstable algorithm A in order to get a stable algo- 
rithm B that achieves the same approximation ratio on the original set of input 
instances as A has, but B can be successfully used also outside of the original 
set of input instances. This concept seems to be useful because there are many 
problems for which an additional assumption on the “parameters” of the input 
instances leads to an essential decrease of the hardness of the problem. Thus, 
such “effects” are the starting points for trying to partition the whole set of 
input instances into a spectrum of classes according to approximability. 

This paper is organized as follows: In Section 2 we present our concept of 
approximation stability. Sections 3 and 4 illustrates the usefulness of this con- 
cept by applying it to TSP and the knapsack problem respectively. Section 5 is 
devoted to a general discussion about possible applications of this concept. 

2 The Concept of the Stability of Approximation 

We assume that the reader is familiar with the basic concepts and notions 
of algorithmics and complexity theory as presented in standard textbooks 
like [4,10,13,21,23]. Next, we give a new (a little bit revised) definition of the 
notion of an optimization problem. The reason to do this is to obtain the pos- 
sibility to study the influence of the input sets on the hardness of the problem 
considered. Let IN = {0,1,2,...} be the set of nonnegative integers, and let IR^ 
be the set of positive reals. 

Definition 1. An optimization problem U is an 7-tuple U = 

A4, cost, goal), where 

(i) B I is an alphabet called input alphabet, 

(a) Bo is an alphabet called output alphabet, 

(Hi) L C B") is a language over Bj called the language of 
consistent inputs, 

(iv) Lj C L is a language over Bj called the language of actual inputs, 

(v) Ai is a function from L to 2^o ^ where, for every x E L, Ai{x) is 
called the set of feasible solutions for the input x, 

(vi) cost is a function, called cost function, from Uccgl*^(^) ^ 

(vii) goal G {minimum, maximum}. 

For every x E L, we define 

Outputu{x) = {y E Ai{x) I cost{y) = goal{cost{z) \ z E Ai[x)}}, 



and 

Opt{x) = cost{y) for some y E Outputu(x). 

Clearly, the meaning for Bj, Bq, Ai, cost and goal is the usual one. L 
may be considered as a set of consistent inputs, i.e., the inputs for which the 
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optimization problem is consistently defined. Lj is the set of inputs considered 
and only these inputs are taken into account when one determines the complexity 
of the optimization problem U. This kind of definition is useful for considering the 
complexity of optimization problems parametrized according to their languages 
of actual inputs. In what follows Language{U) denotes the language Lj of 
actual inputs of U. 

To illustrate Definition 1 consider the Knapsack Problem (KP). In what fol- 
lows hin{u) denotes the integer binary coded by the string u G {0,1}*. The 
input of KP consists of 2n + 1 integers rci, u; 2 , . . . , 6, ci, C 2 , . . . , c^, n G JN. 

So, one can consider Uj = {0,1,#} with the binary coding of integers and 
# for The output is a vector x G {0, l}’^, and so we set Uq = {0? !}• 
L = {0,1}* • U^o speak about the Simple Knapsack Prob- 

lem (SKP) if Wi = Ci for every i = l,...,n. So, we can consider Lj = 
{^l#^ 2 # . • . #^n#^#^l#^ 2 # . • • | 6, G {0, 1}* for i = 1, . . . , u, u G IN} 

as a subset oi L. Ai assigns to each / = yi^ . . . the set of 

words A4(I) = {x = X\X 2 . . G {0, l}’^ | Xi'hin{yi) < hin{h)}. For every 

X = x\ . . .Xn G A4(I), cost(x) = The goal is maximum. So, 

KP = (i7/, i7o, T, L, A4, cost, goal), and SKP = {Kj, Kq, L, Lj, A4, cost, goal). 

Definition 2. Let U = {Kj, L, Lj, M, cost, goal) be an optimization prob- 
lem. We say that an algorithm A is a consistent algorithm for U if, for every 
input X G Li, A computes an output A{x) G Ai{x). We say that A solves U 
if, for every x G Li, A computes an output A{x) from Outputu{x). The time 
complexity of A ist defined as the function 



TimeA{ri) = max{Time^(x) | x G T/ D Ef} 

from IN to IN, where TimeA{x) is the length of the computation of A on x. 

Next, we give the definitions of standard notions in the area of approximation 
algorithms (see e.g. [8,13]). 

Definition 3. Let Lf = {Ej, Eq^ L, Lj, M, cost, goal) be an optimization prob- 
lem, and let A be a consistent algorithm for U. For every x e Li, the relative 
error sa{x) is defined as 

\cost(A(x)) — Opt(x)\ 

= oSW ■ 

For any n G IN, we define the relative error of A 



^a(^) = mdix{eA{x) \ X e Li n Ef}. 

For every x G Li, the approximation ratio Ra{x) is defined as 

cost{A{x)) Opt{x) } 



Ra{x) = 



Opt{x) ’ cost{A{x)) j 



1 -h Sa{x). 
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For any n G IN^ we define the approximation ratio of A as 
Ra{ji) = max{i?^(x) I X G I// n 

For any positive real 6 > 1, we say that A is an S- approximation algorithm 
for U if Ra{x) < 6 for every x e Lj. 

For every funetion / : IN ^ we say that A is a f {n) -approximation 
algorithm for U if RA{n) < f{n) for every n G IN. 

The best what can happen for a hard optimization problem U is that one 
has a polynomial-time (1 + 5)-approximation algorithm for U for any 5 > 0. 
In that case we call this collection of approximation algorithms a polynomial- 
time approximation scheme (PTAS) for U. A nicer definition of a PTAS 
than the above one is the following one. An algorithm 5 is a PTAS for an 
optimization problem U if B computes an output G M{x) for every 

input (x, e) G Languagejj x IR^ with 

\eost{B{x,e)) — Opt{x)\ ^ 

Opt{x) ~ ^ 

in time polynomial according to \x\. If the time complexity of B can be bounded 
by a function that is polynomial in both \x\ and then we say that B is 
a fully polynomial-time approximation schema (FPTAS) for U. 

Now, we define the complexity classes of optimization problems in the usual 
way (see e.g. [13,19]). 

Definition 4. 

NPO = {U = (i7/, i7o, T, L/, Ad, eost^ goal) \ 

U is an optimization problem^ L,Lj G P; 

for every x G L^M{x) G P; eost is eomputahle in polynomial time}^ 

For every optimization problem U = eost^ goal), the 

underlying language of U is 

Underu = k) \ w ^ Lj, k eJN — {0}, Optjj{w) < k} 
if goal = maximum. Analogously, if goal = minimum 

Underu = r) \ w e Lj, r G IN — {0}, Optjj{w) > r}. 

PO = {U e NPO I Underu G P} 

APX = {U e NPO I there exists an e- approximation algorithm for U for some 
e:eIR+}. 

In order to define the notion of stability of approximation algorithms we 
need to consider something like a distance between a language L and a word 
outside L. 
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Definition 5. Let U = cost^ goal) and U = {Uj^Uq^L^L^ 

A4, cost, goal) be two optimization problems with Lj c L. A distance function 
for U according to Lj is any function h : L ^ IR^ satisfying the properties 

(^) 

h{x) =0 for every x e Lj. 

(ii) 

h can be computed in polynomial time. 

We define, for any r G 

Ballr,h(Li) = {w G L \ h{w) < r}. 

Let A be a consistent algorithm for U, and let A be an £- approximation algo- 
rithm for Lf for some £ G IR^. Let p be a positive real. We say that A 
is p- stable according to h if, for every real 0 < r < p, there exists 
a 6r^e ^ such that A is an 5 r^e- approximation algorithm for Ur = 

L, Ballr,h{Li), A4, cost, goal).^ 

A is stable according to h if A is p-stable according to h for every p G IR^. 
We say that A is unstable according to h if A is not p-stable for any p G IR^. 

For every positive integer r, and every function /^ : IN ^ IR^ we say that 
A is {r^ f{n))- quasistable according to h if A is an fr{n)- approximation 
algorithm for Ur = {Fi, Eq, L, Ballr,h{Li), Ai, cost, goal). 

We see that the existence of a stable c- approximation algorithm for U imme- 
diately implies the existence of a (5^^c-approximation algorithm for Ur for any 
r > 0. Note, that applying the stability to PTASs one can get two differ- 
ent outcomes. Consider a PTAS A as a collection of polynomial-time (1 + 5 )- 
approximation algorithms A^ for every 6 G IR^. If A^ is stable according to 
a distance measure h for every 5 > 0, then we can obtain either 

(i) a PTAS for Ur = {Ej, Eq, L, Ballr,h{Li), A4, cost, goal) for every r G 
IR^ (this happens, for instance, if 6r,£ = 1 + 5 • /(r), where / is an 
arbitrary function), or 

(ii) a (5^,£-approximation algorithm for Ur for every r G IR^, but no PTAS 
for Ur for any r G IR^ (this happens, for instance, if 6r,s = 1 + r + 5 ). 

To capture these two different situations we introduce the notion of “supersta- 
bility” as follows: 

Definition 6. Let U = {Ej, Eq^ L, Lj, M, cost, goal) and U = {Ej,Eo^L,L, 
A4, cost, goal) be two optimization problems with Lj C L. Let h be a distance 
function for if according to Lj, and let Ur = {Ej, Eq^ L, Ballr^h{Li), Ai, cost, 
goal) for every r G IR^. Let A = {AA\e>e^ be a PTAS for U . 

If, for every r > 0 and every £> 0, A^ is a 5 r^e- approximation algorithm for 
Ur, we say that the PTAS A is stable according to h. 

Note, that Sr^e is a constant depending on r and £ only. 



5 
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If5r,s < f{s) ■ g{r), where 

(i) f and g are some functions from lR-° to IR"'", and 
(a) lini£^o/(e) = 0, 

then we say that the PTAS A is super-stable according to h. 



Remark 1. If A is a super-stable (according to a distance function h) PTAS for 
an optimization problem U = eost^goal), then A is a PTAS 

for the optimization problem Ur = {Ej, Eq, L, Ballr,h{Li), east ^ goal) for 
any r G IR^. 

One may see that the notions of stability can be useful for answering the 
question how broadly a given approximation algorithm is applicable. So, if one 
is interested in positive results then one is looking for a suitable distance measure 
that enables to use the algorithm outside the originally considered set of inputs. 
In this way one can search for the border of the applicability of the given algo- 
rithm. If one is interested in negative results then one can try to show that for 
any reasonable distance measure the considered algorithm cannot be extended 
to work for a much larger set of inputs than the original one. In this way one can 
search for fine boundaries between polynomial approximability and polynomial 
non-approximability. 



3 Stability of Approximation and Traveling Salesperson 
Problem 

We consider the well-known TSP problem that is in its general form very hard 
for approximation. But if one considers complete graphs in which the triangle 
inequality holds, then we have the two following approximation algorithms for 
the A-TSP. 



Algorithm ST (Spanning Tree) 

Input: a complete graph G = (V, ^), and a cost function c : E IN^ satisfying 
the triangle inequality {c{{u, u}) < c({u, re})-hc({u;, i^}) for all three pairwise 
different u^v^w G V). 

Step 1 : Construct a minimal spanning tree T of G according to c. 

Step 2: Choose an arbitrary vertex v G V. Realize Depth- first-search of T from 
u, and order the vertices in order in that they are visited. Let H be the 
resulting sequence. 

Output: The Hamiltonian tour H = H^v. 
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Christofides Algorithm 

Input: a complete graph G = (V, E"), and a cost function c : E ^ IN^ satisfying 
the triangle inequality. 

Step 1 : Construct a minimal spanning tree T of G according to c. 

Step 2: S := {v eV \ degriv) is odd}. 

Step 3: Compute a minimum- weight perfect matching M on S' in G. 

Step 4: Create the multigraph G' = (V, E(T) UM) and construct an Euler tour 
w in G'. 

Step 5 : Construct a Hamiltonian tour E of G by shortening w {realize it by 
removing all repetitions of the occurences of every vertex in re, in one run 
via w from the left to the right}. 

Output: H 



It is well known that the algorithm ST and the Christofides algorithm are a- 
approximation algorithms for the Z\-TSP with a = 2 and = f respectively. It 
can be simply observed that both algorithms are consistent for the general TSP. 
Since the triangle inequality is crucial for the approximability of the problem 
instances of Z\-TSP, we consider the following distance functions. 

We define for every x G E, 



dist{x) = max |o, max | 



c{{u,v}) 



c{{u,p}) E c{{p,v}) 



- 1 



u,v,p G 14 



and 



7 / \ f f 

distanceix) = max < 0, max < -=^ 






= lC({pi,Pi+l}) 



u,v e 14 , 



and = pi,p 2 , • • • ,Pm+i = i’ is a simple path between u and v in Gx 

We observe that dist and distance measure the “degree” of violating the 
triangle inequality in two different ways. Let x = (G, c) be an input instance of 
TSP. For the simplicity we consider the size of x as the number of nodes of G 
instead of the real length of the code of x over Ej. The inequality dist{G, c) < r 
implies 

c{{u, v}) < (1 + r) • [c{{u, M)}) + c{{w, ?;})] 

for all pairwise different vertices i4, v, w of G. The measure distance involves 
a much harder requirement than dist. If distance{G^ c) < r, then c({i4, i^}) may 
not be larger than (1 + r) times the cost of any path between u and v^. The 
next results show that these two distance functions are really different because 
the approximation algorithms for Z\-TSP considered above are stable according 
to distance but not according to dist. 

Leuiuia 1. The algorithm SP, and the Christofides algorithm are stable accord- 
ing to distance. 

® than the cost of the shortest path between u and v 
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Proof. We present the proof for the algorithm SP only. Let x G Ballr^distance{Li) 
for an r G IR^. Let Dx be the Eulerian tour corresponding to the moves of 
DFS in Step 2. Observe that Dx goes twice via every edge of the minimal 
spanning tree T. Since the cost of T is smaller than the cost of any opti- 
mal Hamiltonian tour, the cost of Dx is at most twice the cost of an opti- 
mal Hamiltonian tour. Let Hx = '^ 2 , • • • , be the resulting Hamiltonian 

tour. Obviously, Dx can be written as V 1 P 1 V 2 P 2 V 3 . . . VnPnVi^ where Pi is a path 
between Vi and '^(i+i)modn in Dx. Since modn}) is at most (1 -h r) 

times the cost of the path modn for all i G {l,2,...,n}, the cost 

for Hx is at most (1 -b r) times the cost for Dx. Since the cost of Dx ist at 
most 2 • Opt{x), the algorithm SP is a (2 • (1 -b r))-approximation algorithm for 

(A'/ , A'o 5 distance (-^/ ) 5 5 COSt.^ TTliTLiTTlUTnf . D 

The next result shows that our approximation algorithms provide a very 
week approximation for input instances of Z\i+^-TSP, where Z\i+^-TSP = 
{Dj, i7o, L, Ballr^dist{L^)^ cost, minimum). 

Lemma 2. For every r G the Christofides algorithm is (r, |(1 -b 
quasistable for dist, and the algorithm ST is (r, 2 • (1 -b r) ) - quasistable for 

dist. 

Proof. Again, we realize the proof for the algorithm ST only. Let x = (G, c) G 
B all r, dist (La) for an r G IR^. Let T,Dx, and Hx have the same meaning as in 
the proof of Lemma 1. The crucial idea is the following one. To exchange a path 
v,P,u of a length m, m £ IN^, for the edge {v,u} one can proceed as follows. 
For any p,s,t G V (G) one can exchange the path p,s,t for the edge {p, t} by the 
cost increase bounded by the multiplicative constant (1 + r). This means that 
reducing the length m of a path to the length [m/2] increases the cost of the 
connection between u and v by at most (1 + r)-times. After at most [log 2 m] 
such reduction steps one reduces the path v,P,u to the path v, u and 

cost{u, v) = c{{v, u}) < (1 + r) . cost{v, P, u) . 

Following the argumentation of the proof of Lemma 1 we have c{Dx) < 2- Opt{x). 
Since m < n for the length m of any path of Dx exchanged by a single edge, we 
obtain 



c{H^) < (1 + r) "1 ■ c{D^) < 2 ■ (1 + r) "1 ■ Opt{x) . 



□ 

Now, we show that our approximation algorithms are really unstable accord- 
ing to dist. To show this, we construct an input, for which the Christofides 
algorithm provides a very poor approximation. 

We construct a weighted complete graph from B all r, dist {Lj) as follows. We 
start with the path po,pi, . . . ,pn for n = 2^ , k ^ IN, where every edge {pi^Pi-\-i} 
has the cost 1. Then we add edges for i = 0, 1, . . . , n — 2 with the cost 

2 • (1 + r). Generally, for every m G {!,..., log 2 n}, we define c{{pi,pi-^2^}) = 
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4-(l + r)2 4-(l + r)2 4 • (1 + r)^ 




Fig. 1. 



2^ • (1 + for i = 0, . . . , n — 2"^. For all other edges one can take the maximal 
possible cost in such a way that the constructed input is in Ballr^dist{Li). 

Let us have a look on the work of our algorithms on this input. There is 
only one minimal spanning tree that corresponds to the path containing all 
edges of the weight 1. Since every path contains exactly two nodes of odd 
degrees, the Euler tour constructed by the Christofides algorithm is the cycle 
D = with the n edges of weight 1 and the edge of the max- 
imal weight n • (1 + ^ Since the Euler tour is a Hamilto- 

nian tour, the output of the Christofides algorithm is unambigously the cycle 
POiPu ‘ ‘ ‘ ^PmPo with the cost n + n(l Observe, that the algorithm ST 

computes as output the same tour if the depth-first-search started from po or 
from pn. But the optimal tour is 

^ PO 7 P2 7 Pa 5 • • • 5 P2i 1 P2(i+1) •)'''•) Pn •> Pn— 1 7 Pn—3 ^ P2i-\-l 7 P2i—1 ^ PS 7 Pi •> PO • 

This tour contains two edges {po^Pi} and {pn-i,Pn} of the weight 1 and all n — 2 
edges of the weight 2 • (1 -k r). Thus, Opt = cost{T) = 2 + 2 • (1 -h r) • (n — 2), and 

COSt{D) ji 

cost{T) 2 + 2 • (1 + r) • (n — 2) 

^l+log 2 (l+r) ^log 2 (l+r) 

~ 2 • n • (1 -h r) 2 • (1 -k r) 

So, we have proved the following result. 

Lemma 3. For every r G if the Christofides algorithm (the algorithm ST) 
is (r, /(n, r))- quasistable for dist, then f{n, r) > • (1 + r). 
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Corollary 1. The Christofides algorithm and the algorithm ST are unstable for 
dist. 

The results above show that the Christofides algorithm can be useful for 
a much larger set of inputs than the original input set. The key point is to find 
a suitable distance measure. In our case this measure is distance but not dist. An 
interesting question is whether one can modify these algorithms in such a way 
that the modified versions would be stable for dist. 

Surprisingly, this is possible for both algorithms. To make the Christofides 
algorithm stable according to dist one first takes a minimal perfect path match- 
ing instead of an minimal perfect matching. In this way the cost of the Euler 
tour w remains at most | • the cost of an optimal Hamiltonian tour for any input 
instance of TSP. Secondly, we need a special procedure to produce a Hamilto- 
nian tour H by shortening paths of w of the length at most 4 by an edge of 
H. Another possibility is to modify the ST algorithm. As we have observed in 
the example at Figure 1, the main problem is in the fact that shortening a path 
of a length m can increase the cost of this path by the multiplicative factor 
(1 + modify the ST algorithm one has to change the DFS-order of 

the vertices of the minimal spanning tree by another order, that shortens parts 
of the Euclid tour with length at most 3. There are too many technicalities to 
be able to explain these algorithms here into complete details. So, we present 
the final results achieved in the recent papers [1,5,6] only. 

Theorem 1. For every j3 > 1, 

(i) Ag-TSP ean he approximated in polynomial-time within the approxima- 
tion ratio min {4/^, and 

(a) unless P=NP, Ag-TSP eannot he approximated within approximation 
ratio 1 P s ’ p for some 5 > 0. 



4 Superstability and Knapsack Problem 

In this section we do not try to present any new result. We only use the known 
PTASs for the Knapsack problem (KP) to illustrate the transparency of the 
approximation stability point of view on their development. First we show that 
the original PTAS for the simple Knapsack problem (SKP) is stable, but not 
superstable, according a reasonable distance function. So, the application of this 
PTAS for KP leads to an approximation algorithm, but not to a PTAS. Then, 
we show that the known PTAS for KP is a simple modification of the PTAS for 
SKP, that makes this PTAS superstable according to our distance function. 
The well-known PTAS for SKP works as follows: 

PTAS SKP 

Input: positive integers rci, u; 2 , . • . , 5 for some n gJN and some positive real 

number 1 > 5 > 0. 
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Step 1 : Sort w\^W 2 ^ . . . ^Wn- For simplicity we may assume w\ > W 2 > ' ' ' > 

Wn. 

Step 2: Set k = [1/^]. 

Step 3: For every set T = {ii, ^ 2 , • • • , ^z} ^ with \T\ = I < k 

and X^ieT extend T to T' by using the greedy method and values 

rei^+ 2 , . . . (The greedy method stops if Wi < b and Wj > 

^ - T^ieT' for all j ^ T', j > k.) 

Output: The best set T' constructed in Step 3. 



For every given 5, we denote the above algorithm by A^. It is known that 
is an ^-approximation algorithm for SKP. Observe that it is consistent 
for KP. Now, we consider the following distance function DIST for any input 
rci, rc 2 , . . . , rcn, 6, Cl, . . . , Cn of KP: 

DIST{wi , . . . ,rcn,6,ci, . . . ,Cn) = 



max 





max 




Wi >Ci, i e {1, 




Let KPs = {Ui, L, Balls, dist{Li), A4, cost^maximum) for any 6. Now, we 
show that PTAS SKP is stable according to DIST but this result does not imply 
the existence of a PTAS for KPs for any > 0. 



Lemma 4. For every s > 0, and every (5 > 0, the algorithm A^ is a (1 + c + 
(5(2 + (5) • (1 + e))- approximation algorithm for KPs. 



Proof. Let vui > VU 2 > - - - > vOn for an input / = rci, . . . , 6, ci, . . . , c^, and 

let k = [1/c] . Let U = {ii, ^ 2 , • • • , '^z} ^ {1? 2, . . . , n} be an optimal solution for 
/. If / < /c, then A^ outputs an optimal solution with eost{U) because A^ has 
considered as a candidate for the output in Step 3. 

Consider the case I > k. A^ has considered the greedy extension of T = 
in Step 2. Let T' = {ii,i 2 , ■ ■ .,ik,jk+i,- ■ -,jk+r} be the greedy 
extension of T. Obviously, it is sufficient to show that the difference eost{U) — 
eost{T') is small relative to eost{U)^ because the cost of the output of A^ is at 
least eost{T'). We distinguish the following two possibilities according to the 
weights of the feasable solutions U and T': 



1. Let 

Obviously, for every i, (1 + ^)“^ < ^ < 1 + ^. So, 

eost{U) = E<=*s(i + «)-E»* 

ieu ieu 



cost{T') = y] Cj > (1 + h) ^ . 

jeT' jeT' 



and 
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In this way we obtain 



cost (U) — cost {T') < 

< (1 + <^) ■ X] ^ 

ieu jeT' 

< (1 +<^) • “ (1 ^ 

ieu ieu ieu 

ieu ieu 

= 6 ' {2 + 6) ' cost{U). 



Finally, 



cost{U) - cost{T') 6 ' {2 + S) ' cost{U) _ ^ 
cost{U) ~ cost{U) 



2. Let d — '^ieu '^jeT' ^ 

Let c be the cost of the first part of U with the weight X^jeT' Then in 
the same way as in 1. one can establish 



c — cost{T') 
c 



<6 • {2 + S ) . 



( 1 ) 



It remains to bound cost{U) — c, i.e. the cost of the last part of U with the 
weight d. Obviously, d < b — some r > k, ir ^ U (if 

not, then would add r to T' in the greedy procedure). Since Wi^ > wi^ > 



d < Wi^ < 



-^mo h wi 



< 






eu 



Wi 



k 1 






Wi 



(2) 



ieu 



Since cost{U) < c + d • (1 + (5) we obtain 



cost{U) — cost{T') ^ c + d • (1 + (5) — cost{T') 
cost{U) ~ cost{U) 

( 2 ) c-cost{T') {l + S)^e-Zieu^i 

~ costiU) cost{U) 



< . (2 + ( 5 ) + (1 + ^) . 5 • (1 + ^) 

= 2(5 + + 5 • (1 + 

= 5 + (5 • (2 + ( 5 ) • (1 + s ). 



□ 
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We see that the PTAS SKP is stable according to DIST^ but this does not 
suffice to get a PTAS for KPs for any (5 > 0 . This is because in the approximation 
ratio we have the additive factor • (2 + ( 5 ) that is independent of 5. In what 
follows we change the PTAS SKP a little bit in such a way that we obtain 
a PTAS for every KPs^ (5 > 0 . The modification idea is very natural -to sort the 
input values according the cost per one weight unit. 

PTAS MOD-SKP 

Input: positive integers rci, rc2, . . . , 6, ci, . . . , for some n G IN and some 

positive real number 5, 1 > 5 > 0. 

Step 1: Sort For simplicity we may assume — > for i = 

1. 

Step 2: Set k = 

Step 3 : The same as Step 3 of PTAS SKP, but the greedy procedure follows 
the ordering of re^’s of Step 1. 

Output: The best T' constructed in Step 3 . 

Let MOD-SKg denote the algorithm given by PTAS MOD-SKP for a fixed 

5 > 0. 

Lemma 5 . For every s, 1 > s > 0 and every > 0 , MOD-SK^ is a 
approximation algorithm for SKs . 

Proof. Let U = {ii, 22, • • • , d} ^ {I5 2, . . . , n}, where Wi^ < vui^ < • • • < , be 

an optimal solution for the input I = rci, . . . , 6, ci, . . . , c^. 

If I < k then PTAS MOD-SKP^ provides an optimal solution. 

If / > /c, then we consider a T' = {ii,i2, • • • Pk^jk+i^ • • • ^jk+r} as a greedy 
extension of T = {ii, i2, • • • , 'Ik}- Again, we distinguish two possibilities: 

1. Let Yieu w'i - EiGT' Wj < 0 . 

Now, we show that this contradicts to the optimality of U. Both, eost{U) 
and eost{T') contain contains the best choice of 

wfs according to the cost of one weight unit. The choice of U per one weight 
unit cannot be better. So, eost{U) < eost{T'). 

2. Let d '^i '^j£T' ^3 — 

Because of the optimal choice of T' according to the cost per one weight 
unit, the cost c of the first part of U with the weight most 

cost{T'), i.e.. 



c — eost{T') < 0 . ( 3 ) 

Since U and T' contain the same k indices ii^h^ - - - Pk and vui ^ , . . . , vui^ are 
the largest weights in both U and T', the same same consideration as in the 
proof of Lemma 1 (see (2)) yields 

d <s - ^ Wi^ and eost{U) < c-\- d • {1 P S) . 



( 4 ) 
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Thus, 



cost{U) — cost{T') 
cost{U) 



c + d- (1 + ^) - cost{T') 
cost{U) 




cost{U) 






□ 



We observe that the collection of MOD-KP^ algorithms is a PTAS for every 
KPs with a constant (5 > 0 (independent of the size 2n + 1 of the input). 

5 Conclusion and Discussion 

In the previous sections we have introduced the concept of stability of approxima- 
tions. Here we discuss the potential applicability and usefulness of this concept. 
Using this concept, one can establish positive results of the following types: 

1. An approximation algorithm or a PTAS can be successfully used for a larger 
set of inputs than the set usually considered (see Lemma 1 and Lemma 4). 

2. We are not able to successfully apply a given approximation algorithm A 
(a PTAS) for additional inputs, but one can simply modify A to get a new 
approximation algorithm (a new PTAS) working for a larger set of inputs 
than the set of inputs of A. 

3. To learn that an approximation algorithm is unstable for a distance measure 
could lead to the development of completely new approximation algorithms 
that would be stable according to the considered distance measure. 

The following types of negative results may be achieved: 

4. The fact that an approximation algorithm is unstable according to all “rea- 
sonable” distance measures and so that its use is really restricted to the 
original input set. 

5. Let Q = A4^ cost^ goal) G NPO be well approximable. If, 

for a distance measure D and a constant r, one proves the nonexistence 
of any polynomial-time approximation algorithm for Qr,D = (U/, Eo, T, 
Ballr,D{Li)^ cost^goal), then this means that the problem Q ist “unsta- 
ble” according to D. 

Thus, using the notion of stability one can search for a spectrum of the hardness 
of a problem according to the set of inputs. For instance, considering a hard prob- 
lem like TSP one could get an infinite sequence of input languages Lq, Ti, L 2 , . . . 
given by some distance measure, where er{n) is the best achievable relative error 
for the language Results of this kind can essentially contribute to the study 
of the nature of hardness of specific computing problems. 
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Abstract. We survey the complexity issues related to several algorith- 
mic problems for compressed one- and two-dimensional texts without 
explicit decompression: pattern-matching, equality-testing, computation 
of regularities, subsegment extraction, language membership, and solv- 
ability of word equations. Our basic problem is one- and two-dimensional 
pattern-matching together with its variations. For some types of com- 
pression the pattern-matching problems are unfeasible (NP-hard), for 
other types they are solvable in polynomial time and we discuss how to 
reduce the degree of corresponding polynomials. 



1 Introduction 

In the last decade a new stream of research related to data compression has 
emerged: algorithms on compressed objects. It has been caused by the increase 
in the volume of data and the need to store and transmit masses of informa- 
tion in compressed form. The compressed information has to be quickly accessed 
and processed without explicit decompression. In this paper we consider sev- 
eral problems for compressed strings and arrays. The complexity of basic string 
problems in compressed setting for one dimensional texts is polynomial, but it 
jumps if we pass over to two dimensions. Our basic computational problem is 
the fully compressed matching: 

Instance: V = Compress(P) and T = Compress{T), representing the com- 
pressed pattern and the compressed text. 

Question: does Decompress {V) occur in Decompress{T)7 

We can change the way we formulate a problem instance by representing the 
pattern directly in uncompressed form , Le., as a text or an array P. This defines 
the compressed matching problem. We may also add to the problem instance the 
coordinates of a location of P within the text, and ask whether the text contains 
an occurrence of the pattern at this location. By representing the pattern in the 
compressed and uncompressed form we define the problems of fully compressed 
pattern checking and of compressed pattern checking. 

In this paper we are mostly interested in highly compressed objects, which 
means that the real size (of uncompressed object) is potentially exponential with 

* Supported by the grant KBN 8T11C03915. 
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respect to the size of its compressed representation. For one theoretical type of 
compression (in terms of morphisms) this can be even doubly exponential. In case 
of high compression the existence of polynomial time algorithms for many basic 
questions is a nontrivial challenge. High compression could be present in practical 
situations especially in compressing images. The theoretically interesting highly 
compressible strings are Fibonacci words. The n-th Fibonacci word is described 
by the recurrences: Fi := b; F 2 := a; Fn := F^-i • Fn -2 
For example 

Fio = abaababaabaababaababaabaababaabaababaababaabaababaababa. 

Two interesting examples of highly compressed arrays are the k-th rank square 
corner of Sierpinski triangle, denoted by Sk^ see Figure 4, and the k-th order 
Hilbert array see Figure 3. Sk is a 2^ x 2^ black and white array defined 
recursively: So consists of a single black element, and Sk is a composition of 
3 disjoint copies of Sk-i and totally white (blank) subarray, see Figure 1. The 
array Hk represents the sequence of moves in a strongly recursive traversal of 2^ x 
2^ array starting and finishing at fixed corners. The traversal of a square array 
is strongly recursive iff when entering any of its four quadrants all fields of this 
quadrant are visited (each one exactly once) before leaving the quadrant, and the 
same property holds for each quadrant and its sub-quadrants. The compressed 
matching problem for these particular examples is: check if an explicitly given 
word (array) P of total size m is a subword (subarray) of F^ (i^n, Sn)^ find the 
number of occurrences. Observe that in both cases we cannot simply construct 
Hn or Sn since their uncompressed sizes are exponential with respect to n. 
Such description of Fk, Sk and Hh coresponds to one- and two-dimensional 
straight-line programs (SLP’s), defined formally later . An SLP is a way of 
describing larger objects in terms of their (smaller) parts. 





blank 







Fig. 1. The recursive structure of Sk 



2 Sequential Searching in Compressed Texts 

We discuss several types of 1-dimensional compression: run-length, straight-line 
programs (SLP’s), LZ, LZW, and antidictionary compressions. We concentrate 
on LZ and SLP-compressions as the most interesting and closely related poten- 
tially exponential compressions. The key concepts in highly compressed matching 
algorithms are periodicity and linearly- succinct representations of exponentially 
many periods. 
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2.1 Periodicities in Strings 

A nonnegative integer p is a period of a nonempty string w w[i] = w[i — p]^ 
whenever both sides are defined. 

Lemma 1. [ 11 ] If w has two periods p^q sueh that p q < \w\ then gcd{p^q) 

is a period of w, where gcd means ‘^greatest eommon divisor^! 

Denote Periods (w) = {p : p is a period of re}. A set of integers forming an 
arithmetic progression is called here linear. We say that a set of positive integers 
from [1 ... t/] is linearly- sueeinet iff it can be decomposed in at most [log2(t/)J +1 
linear sets. For example Periods{aba) = { 0 , 2 , 3 }. 

Lemma 2. [ 34 ] The set Periods (w) is linearly- sueeinet w.r.t. \w\. 

Denote ArithProg{i^p^ k) = |i, i + p, i + 2 p, . . . , i + kp}^ so it is an arithmetic 
progression of length k -\-l. Its description is given by numbers i^p^k written in 
binary. The size of the description, is the total number of bits in i^p^ k. Denote 
by Solutions {p^ 1 /, IF) = (/c, s) where k is any position i e U such that i -\- j = p 
for some j E W and s is the number of such numbers k. The following lemma is 
used to count number of pattern occurrences in highly compressed texts. 

Lemma 3 . Assume that two linear sets U^W C [1 . . . A"] are given by their 
deseriptions. Then for a given number c G [1 ... A] we ean eompute 
Solutions {c^ F, IF) in polynomial time with respeet to log(A). 



2.2 Run-Length Compression 

The run-length compression (denoted by RLC(w)) of the string w is its repre- 
sentation in the form w = where afs are single symbols and 

Oi 7^ for 1 < i < /c. Denote the size of the compressed representation by 
n = \RLC{w)\ = k. We ignore here the sizes of integers and assume that each 
arithmetic operation takes a unit time. 

Theorem 1. [3] Assume we are given run-length eneodings of the text T and 

the pattern P of sizes n = \RLC{T)\ and m = \RLC{P)\. Then we ean eheek 
for an oeeurrenee of P in T in 0 {n -\- m) time. 

Proof. Assume that the pattern and the text are nontrivial words: each of 
them contains at least two distinct letters. Let T = a}^a2^...a^^ and 

P = b\^ b^2 - ‘ • Construct 

T' = r2rs . . . r/e_i , P' = t 2 h • • • , 

a{T) = 02as . . . Ok-i and a{P) = 6263 . . . bs-i. 

We search for all occurrences of P' in T' and simultaneously o(P) in a{T). For 
each starting occurrence i a constant-time additional work suffices to check if P 
occurs at i in T. 



Algorithms on Compressed Strings and Arrays 



51 



2.3 1-Dimensional Straight-Line Programs 

A straight-line program (SLP) 7^ is a sequence of assignment statements: 

Xi := expr^; X 2 := expr 2 ] • • • ; X^ := expr^ 

where Xi are variables and expr^ are expressions of the form: 

expXi is a symbol of a given alphabet or expr^ = Xj • X/., for some j, k < i^ 

where • denotes the concatenation of Xi and Xj. 

Theorem 2. [34,23] The first oecurrenee and the number of all oeeurrenees 

of a SLP eompressed pattern in an SLP eompressed text can be computed in 
polynomial time. 

Proof. ( Sketch ) 

For two variables X, Y define Overlaps (X^Y) as the set of all positions i in F 
such that the suffix of Y which starts at i is a prefix of X. De to Lemma 2 the 
sets Overlaps (X^Y) are lineraly-succinct. In the pattern matching problem we 
compute Overlaps{Xi, P) and Overlaps {P, Xj) for every variable X^, Xj in the 
SLP describing T (bootom-up). Then we use Lemma 3 to check if there is an 
occurrence of the pattern overlapping the splitting poinh see Figure 2 , of some 
variable X^. 




Fig. 2. A pattern occurs on a splitting point of the variable Xk iff 
Solutions{\P\^Ul^U2) 7 ^ 0, where U1 = Overlaps (P, Xi), U2 = 

Overlaps {Xj , P) 



Example. The 5^^ Fibonacci word is described by the following SLP: 

Xi := b; X2 := a; X3 := X2X1; X4 := X3X2; X5 := X4X3 

Using our algorithm it can be effectively found, for example, an occurrence (if 
there is any) of the Fibonacci word F 220 in the Thue-Morse word ^200 (see [40] 
and the last section for definition), despite the fact that real lengths of these 
strings are astronomic: |' 02 oo| = and IF 220 I ^ 2^^^. 

Let occ{P,T) be a word of length |T| over the alphabet {0,1} such that 
occ[i] = 1 iff i is an ending position of an occurrence of P in T. 
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Theorem 3. There is an SLP for oee{P,T) whieh is of polynomial size with 
respeet to \SLP{T)\ and \P\. 

Surprisingly the set of occurrences of a fixed pattern in a compressed text can 
have a very short SLP representation but its representation in terms of union of 
arithmetic progressions is exponential. We take the sequence of words Wi over 
alphabet {a, 6} such that the k-th letter of Wi is a iff the number /c, written in 
ternary, does not contain the digit 1. (The positions in Wi are counted starting 
from 0.) 

Theorem 4. There are strings Wi whose SLP representation is linear but the 
set of all oeeurrenees of a single letter a in Wi is not linearly sueeinet. 



2.4 The Lempel-Ziv Compression 

The LZ compression (see [60]) gives a very natural way of representing a string 
and it is a practically successful method of text compression. We consider the 
same version of the LZ algorithm as in [18] (this is called LZl in [18]). Intuitively, 
LZ algorithm compresses the text because it is able to discover some repeated 
subwords. We consider here the version of LZ algorithm without self- ref ereneing 
but our algorithms can be extended to the general self-referential case. Assume 
that U is an underlying alphabet and let re be a string over U. The factorization 
of w is given by a decomposition: w = C1/1C2 . . . //cC/c+i, where ci = w[l] and 

for each l<i<kciEU and fi is the longest prefix of /iQ+i . . . fkCk-\-i which 
appears in C1/1C2 . . . /i-iQ. We can identify each fi with an interval [p, g], such 
that fi = w[p..q] and q < \c 1 f 1 C 2 . . . /i-iQ-ij. If we drop the assumption related 
to the last inequality then a self-refereneing occurs {fi is the longest prefix which 
appears before but not necessarily terminates at a current position) . We assume 
that this is not the case. 

Example. The factorization of a word aababbabbaababbabbaf/^ is given by: 

Cl fi C 2 /2 C3 fsC 4 f 4 Cs=aababb abb a ababbabba #. 

After identifying each subword fi with its corresponding interval we obtain the 
LZ encoding of the string. Hence 

LZ {aababbabbababbabbi^) = a[l, 1]6[1, 2]5[4, 6]a[2, 10]#. 



Lemma 4. For eaeh string w given in LZ-eompressed form we ean eonstruet an 
SLP generating w of size 0{n^), where n = \LZ{w)\ . 

Theorem 5. [18] The eompressed pattern-matehing for LZ-eneoded texts ean 

be done in 0{n • log^(|T|/n) + |P|) time. 

When we pass to the fully compressed pattern-matching the situation is more 
complicated. We can use Lemma 4 to reduce the problem to the SLP-compressed 
matching. Another approach was used in [23] , where a generalization of SLP’s has 
been introduced and applied: a eomposition system. Denote by Eq{n) the time to 
check subwords-equality of LZ-compressed texts. The technique of randomized 
fingerprinting has been used to show: 
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Theorem 6. [24] Assume a string w is given in LZ-eompressed form, then we 

ean preproeess w in O(n^logn) time in sueh a way that eaeh subword equality 
query about w ean be answered in O(ndoglogn) time with a very small probability 
of error. 

Theorem 7. [23] The Fully Compressed Matehing Problem for LZ-eompressed 
strings ean be solved in 0{n\ogn\og^ \T\ • Eq{n\ogn)) time. 

Theorem 8. The linearly- sueeinet representation of the set Periods (V) ean be 
eomputed in 0{n\ogn\og^ \T\ • Eq{n\ogn)) time. 

Theorem 9. [23] We ean test if an LZ-eompressed text eontains a palindrome 

in 0{n\ogn\og^ \T\ • Eq{n\ogn)) time. 

We ean test for square- freeness in 0{n^ log^ nlog^ |T| • Eq{n\ogn)) time. 

Theorem 10. [23] Given text T , its eode LZ(T) ean be eomputed on-line with 
O(n^log^n) delay using O(nlogn) spaee. 

2.5 LZW Compression 

The main difference between LZ and LZW compression is in the choice of code 
words. Each next code word is of a form wa, where w is an earlier code- word and 
a is a letter. The text is scanned left to write, each time the next largest code- 
word is constructed. The code-words form a trie, and the text to be compressed 
is encoded as a sequence of names of prefixes of the trie. Denote by LZW (w) the 
Lempel-Ziv- Welch compression of w. This type of compression cannot compress 
the string exponentially, it is less interesting from the theoretical point of view 
but much more interesting from the practical point of view. 

Lemma 5. \LZW {w)\ = ^2{^/\w\). 

Theorem 11. [2] The eompressed matehing problem for LZW-eneoded strings 

ean be done in 0(min{n + m^, n • log(m) + m}) time. 

Theorem 12. [25] The fully eompressed matehing problem for LZW-eneoded 

strings ean be done m 0((n + m) • log(n + m)) time. 

2.6 Texts Compressed by Using Antidictionaries 

The method of data compression that uses some “negative” infornmation about 
the text is quite different from the other compression method. Assume in this 
subsection that the alphabet is binary. Let AD{w) denote the set (called the 
antidietionary) of minimal forbidden factors of w, this means words which are 
not subwords of w. The minimality is in the sense of subword inclusion. 
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For example AD{0100W10) = {000, 10101, 11}. 

AD(iioioooi) = {0000, 111, on, oioi, 1100} 

The compression method processes w from left-to right symbol by symbol and 
produces the compressed representation 0 i 02 ... 0 at, where each Oi is a single 
letter or an empty string. Assume that we scan the Tth letter a of re, if there is 
a word ub G AD{w) such that is a suffix of re [1 ... i — 1] then the i-th output 
word Oi is the empty word, otherwise it is a. In other words, if the i-th letter is 
predictable from re [1 ... i — 1] and AD{w), we can skip the i-th letter, since it is 
^^redundanf\ 

Denote ADC{w) = (AD(re), 01 O 2 ... oat, A^), where = |re|. 

ADC{w) is the antidictionary compressed representation of re. 

Let \AD{w)\ denote the total length of all strings in AD{w) and the size of 
compressed representation is defined as: 

\ADC{w)\ = \AD{w)\ -1- |oi 02 . . . on\ + log A^. 

Observe that N is an important part of the information necessary to identify re. 
For example 

ADC{1^^^^) = ({0}, e, 1000), ADC{1^^) = ({0}, e, 10). 

This is a trivial example of an exponential compression. It can happen that 
\ADC{w)\ > I re I, for example 

ADC(llOlOOOl) = ({0000, 111, on, 0101, 1100}, no, 8) 

Theorem 13. [10] ADC{w) can be computed in time 0{n + N), where n = 

\ADC{w)l N=\w\. 



Theorem 14. [57] All occurrences of a pattern in re can be found in time 

0{n |Pp + ^); where n = \ADC{w)\, and r is the number of occurrences of 
the pattern. 

3 Sequential Searching in Compressed Arrays 

3.1 2-Dimensional Run-Length Encoding 

For a 2-dimensional array T denote by 2RLC(T) the concatenation of run- length 
encodings of the rows of T. 

Theorem 15. [4] Assume we are given run-length encoding of a 2D-textT and 
an explicitly given 2D-pattern P, where n = \2RLC{T) and M = |(P)|. Then 
we can check for an occurrence of P in T in 0{n + M) time. 



3.2 2-Dimensional Straight-Line Programs 

A 2D-text can be represented by a 2-dimensional straight-line program (SLP), 
that uses constants from the alphabet, and two types of assignment statements 
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Horizontal concatenation: A ^ B Q)C ^ which concatenates 2D-texts B and 
C (both of equal height) 

Vertical concatenation: A ^ B Q which puts the 2D-text B on top of C 
(both of equal width) 

An SLP V of size n (we write \ V\ > n) consists of n statements of the above form, 
where the result of the last statement is the compressed 2 D-text. The result of a 
correct SLP, P, is denoted Decompress(V). We say that P is a compressed form 
of P. The area of P = Decompress {V)^ denoted |P|, can be exponential in |P|. 

Example. Hilbert’s curve can be viewed as an image which is exponentially 
compressible in terms of SLP’s. An SLP which describes the Hilbert’s curve, 
Hn^ uses six (terminal) symbols H, H, H, H, ED, H, and 12 varia- 
bles T — T 2, t_*l 25 1 *-^ 25 C— I i? ti— 1 25 25 1—^25 I— I— ^ ^-J L-^25 for each 0 ^ i ^ n. 

A variable with index i represents a text square of size 2* x 2* containing part 
of a curve. The dots in the boxes show the places where the curve enters and 
leaves the box. 

The 2 D-text T = O3 describing the 3 ^^ Hilbert’s curve is shown in 
Figure 3 . It is composed of four smaller square 2 D-texts Q25 ^2 and 
according to one of the composition rules. In the figure the black dots indicate 
how T was defined with statement O3 ( ^^2 0 Q2 ) © ( ^2 0 C^2 ) 
The 1x1 text squares are described as follows. 

^ ED, ^ ED, Qo ^ B, Oo ^ B, 

Co ^ H, tE^o ^ H5 Go ^ H5 Qo ^ H5 

Co ^ E 9 , E^o ^ E 9 , EJo ^ ED, l^o ^ ED, 

The text squares for variables indexed by i > 1 are rotations of text squares for 
the variables 1 ^^. These variables are composed according to the 

rules: 

Oi ^ ( Oi-1 0 Ci-1 ) 0 ( Ci_i 0 E^i-I ), 

Qi ^ ( Oi-l 0 ) 0 ( C^_1 0 E^i-l ), 

Di ^( Di-1 0 Ci-i)0( Oi-1 0 Si-1 ). 

Theorem 16. [9] 

( 1 ) There exists a polynomial time randomized algorithm for testing equality of 
two 2 D -texts, given their SLPs. 

( 2 ) Fully compressed pattern checking for 2 D-texts is co-'NP -complete. 

(3) Fully compressed matching for 2 D-texts is Pf -complete. 

(4) Compressed matching for 2 D-texts is -complete. 



3.3 2D-Compressions in Terms of Finite Automata 

Finite automata can describe quite complicated images, for example determin- 
istic automata can describe the Hilbert’s curve with a given resolution, see Fig- 
ure 3 , while weighted automata can describe even much more complicated curves. 
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Fig. 3. An example of a 2D-text T and a pattern P. The pattern occurs twice 
in the text 

see also [13,14,16]. The automata can be also used as an effective tool to compress 
two-dimensional images, see [35,15]. 

Our alphabet isi7 = {0, 1, 2, 3}, the elements of which correspond to four 
quadrants of a square array, listed in a fixed order. A word w of length k over P 
can be interpreted, in a natural way, as a unique address of a pixel x of a 2^ x 2^ 
image (array), we write address(x) = w. The length k is called the resolution of 
the image. For a language L C denote by Image j^{L) the 2^ x 2^ black-and- 
white image such that the color of a given pixel x is black iff address(x) G L. 
Formally, the weighted language is a function which associates with each word w 
a value weight l(w). A weighted language L over P and resolution k determine 
the gray-tone image Image ^{L) such that the color of a given pixel x equals 
weight L^addres s{x)) . We define 

Image^{A) = Image j^{L{A)) 

where L{A) is the language accepted by A. 

Example. 

Image{P^~^{0 U 3)) is the 2^ x 2^ black-and-white chess-board. 

Weighted finite automata and deterministic automata correspond to images 
of infinite resolution. Since we consider images of a given finite resolution k we 
can assume that the considered automata are acyclic. 

Theorem 17. [20] 

(1) Compressed matehing for deterministie automata ean he solved in polynomial 
time. 

(2) Compressed matehing for weighted automata is NP-eomplete. 



3.4 2D-Compression Using LZ-Encodings 

A natural approach to (potentially exponential) compression of images is to 
scan a given two-dimensional array T in some specified order, obtain a linear 
version of T called linear(T)^ and then apply Lempel-Ziv encoding to the string 
linear{T). 
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The Hilbert^s curve Hk corresponds to a strongly regular tv diveTsal of 2^ x 2^ 
grid, starting in a bottom left corner of n x n array T, and ending at the right 
bottom corner, where n = 2^. An example of Hs is illustrated in Figure 3. 

We denote now by H-linear{T) the linearization of T according to the 
Hilbert’s curve. The 2LZ- compression is defined as follows: 

2LZ{T) = LZ{H-linear{T)). 

Such type of encoding was already considered in [39]. 2TZ-compression is as 
strong as finite automata compression, with respect to polynomial reduction. 

Theorem 18. If A describes an image T then 2LZ{T) is of polynomial size 

w.r.t. A. 



Theorem 19. Searching for an occurrence of a row of ones in a 2LZ compressed 
image is NP-hard. 

Surprisingly there are black-and-white images whose 2LZ encoding is small 
and any deterministic acyclic finite automata encoding should be exponential. 

Theorem 20. For each m there is an image Wm such that 2LZ(Wm) is of size 
0{m) but each deterministic automaton encoding the image Wm contains at least 
4^-^ states. 

4 Compressibility of Subsegments 

In this section we consider the problem of constructing a compressed represen- 
tation of a part of a compressed object. The compressed representation of a part 
can be larger than that of the whole object. We start with finite- automata rep- 
resentations. In our considerations we may restrict to acyclic automata since we 
consider only finite resolution images. 

Example. Image{{0, 1, 2}^) = Sk is the 2^ x 2^ black-and-white square part of 
Sierpinski’s triangle, see Figure 4 for the case k = 4. The corresponding smallest 
acyclic deterministic automaton accepting all paths describing black pixels has 

5 essential states but there are needed 9 essential states to describe the 8x8 
subarray R of indicated in Figure 4. (The state is essential iff it is on an 
accepting path, other states are treated as redundant.) 

Theorem 21. [9] Assume the compression is in terms of deterministic 

automata. Let n be the size of a deterministic automaton describing T. 

(a) The compressed representation of a subsquare R of T can be computed in 

time. 

(b) For each subimage IZ of an image T there is a deterministic automaton 

describing IZ of size 0{n‘^'^). There are images T and their subimages IZ such 
that the smallest deterministic automaton for IZ requires states. 
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subsegment R to be cut off 




starting state 



accepting state 



0 ? 1 ? 2 , 0 19 
sO si 2, g2 



0 , 1 , 2 , 



s3 



0 , 1 , 2 , 



s4 



Fig. 4. The image S 4 and its smallest acyclic automaton. Edges which are not 
on an accepting path are disregarded 



The situation is much different for 2-dimensional straight-line programs. 

Theorem 22. [29] For each n there exists an SLP of size n describing a text 

image and a subrectangle Bn of An such that the smallest SLP describing 
Bn has exponential size. 



5 Parallel Searching in Compressed Texts and Arrays 

The difference between sequential and NC-computations for compressed texts 
can be well demonstrated by the following problem: compute the symbol T[i] 
where T is the text given in its LZ-version, and i is given in binary. 

This task has a trivial sequential linear time algorithm (which performs com- 
putations sequentially in the order the recurrences are written). However if we 
ask for an NC-algorithm for the same problem the situation is different, and it 
becomes quite difficult. Any straightforward attempt of using the doubling tech- 
nique and squaring the matrix of positions fails, since the number of positions 
in the text defined by n recurrence equations can be Q(2^). 

Theorem 23. [26] 

(1) The problem of testing for any occurrence of an uncompressed pattern in a 
LZ- compressed text is P- complete. 
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( 2 ) The problem of eomputing a symbol on a given position in a LZ-eompressed 
text is P-eomplete. 



Theorem 24. [26] The SLP-eompressed matehing problem ean be solved in 

0(log(m) • log(n)) time with 0{n) proeessors. 



Theorem 25 . [26] The pattern-matehing problem for SLP-eompressed 2d-texts 
ean be solved in 

(1) 0(log^(n + m)) time with 0{n^ • mPn^) proeessors, or 

( 2 ) 0{n + logm) time with 0{n • (n + m)) proeessors. 

Theorem 26 . [25] There is an almost optimal NC algorithm for fully eom- 

pressed LZW-matehing. 

Theorem 27 . [6,12] 

The eomputation of LZW (w) is P-eomplete. 

The eomputation of LZ{w) is in NC. 

6 The Compressed Language Membership Problems 

The language membership problem is to check if re G L, given Compress{w) 
and a description of a formal language L. 

Theorem 28. 

(a) We ean test the membership problem for LZ-eompressed words in a language 
deseribed by given regular expression W in 0{n • m^) time, where m = \ W\. 

(b) We ean deeide for LZ-eompressed words the membership in a language 
deseribed by given determiniatie automaton M in 0{n • m) time, where m is 
the number of states of M . 

We use the following problem to show A/'P-hardness of several compressed recog- 
nition problems. 

SUBSET SUM problem: 

Input instance: Finite set A = {ai, a 2 , . . . , of integers and an integer K. 
The size of the input is the number of bits needed for the binary represen- 
tation of numbers in A and K. 

Question: Is there a subset A' C A such that the sum of the elements in A' is 
exactly K7 



Lemma 6. The problem SUBSET SUM is NP-eomplete. 
Proof See [31], and [22], pp. 223. 
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Theorem 29. [34,53] Testing the membership of a eompressed unary word in 

a language deserihed by a star-free regular expression with eompressed eonstants 
is N P -eomplete. 

It is not obvious if the previously considered problem is in NP for regular 
expressions containing the operation * , in this case there is no polynomial bound 
on the length of accepting paths of A. There is a simple argument in case of 
unary languages. In the proof of the next theorem an interesting application of 
the Euler path technique to a unary language recognition is shown. 

Theorem 30. [53] 

(a) The problem of eheeking membership of a eompressed unary word in a 
language deseribed by a given regular expression with eompressed eonstants is 
in NP. 

(b) The problem of eheeking membership of a eompressed word in a language 
deseribed by a semi- extended regular expression is NP-hard. 

Theorem 31. The problem of eheeking membership of a eompressed word in a 
given linear eontext-free language L is NP-hard, even if L is given by a eontext- 
free grammar of a eonstant size. 

Proof. Take an instance of the subset-sum problem with the set A = {ai, U 2 , . . . , 
On} of integers and an integer K. Define the following language: 



L is obviously a linear context-free language generated by a linear context-free 
grammar of a constant size. We can reduce an instance of the subset sum-problem 
to the membership problem: 



Theorem 32. 

(a) The problem of eheeking membership of a eompressed word in a given linear 
efl is in NSPACE{n). 

(b) The problem of eheeking membership of a eompressed word in a given efl is 
in DSPACE{n^). 

Proof. We can easily compute in linear space a symbol on a given position 
in a compressed input word. Now we can use a space-efficient algorithm for 
the recognition of context-free languages. It is known that linear languages 
can be recognized in 0{logN) nondeterministic space and general cfis can be 
done in 0{log^N) deterministic space, where N is the size of the uncompressed 
input word. In our case N = 0(2'^), this gives required N SPACE{n) and 
DSPACE{p?) complexities. 



L = < . . . ffd^^ : t > 1 and there is a subset 



{ 



A' C {vi, . . . , Vt} such that 
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Theorem 33. The problem of eheeking membership of a eompressed unary word 
in a given efl is NP-eomplete. 

7 Word Equations 

Word equations are used to describe properties and relations of words, e.g. 
pattern-matching with variables, imprimitiveness, periodicity, and conjugation, 
see [30]. The main algorithm in this area was Makanin’s algorithm for solv- 
ing word equations, see [41]. The time complexity of the algorithm is too high, 
and the algorithm is too complicated. Recently much simpler algorithms were 
constructed by W. Plandowski [51,52]. 

Let U be an alphabet of constants and 0 be an alphabet of variables. We 
assume that these alphabets are disjoint. A word equation is a pair of words 
(u, v) G {UUOy X (i7U0)* usually denoted hy u = v. The size of an equation is 
the sum of lengths of u and v. A solution of a word equation u = v is a morphism 
h : {UUOy ^ E'' such that h{a) = a, for a G A’, and h{u) = h{v). For example 
assume we have the equation 

abxiX 2 X 2 XsXsX 4 X 4 Xs = X1X2X3X4X5X6, 

and the length of xfs are consecutive Fibonacci numbers. Then the solution 
h{xi) is the Tth Fibbonaci word. 

It is known that the solvability problem for word equations is A/'P-hard, even 
if we consider (short) solutions with the length bounded by a linear function and 
the right side of the equation contains no variables, see [5]. 

The main open problem is to show the following: 

Conjecture A: The problem of solving word equations is in NP. 

Conjecture B: Let Af be the minimal length of the solution (if one exists). 
Then JV is singly exponential w.r.t. n. 

The author believs that both questions have posit iove answers. 

A motivation to consider compressed solutions follows from the following fact 
(which is an application of a fully compressed pattern-checking, in this case 
checking occurrence of one compressed string at the beginning of another one). 

Lemma 7. If we have LZ -eneoded values of the variables then we ean verify the 
word equation in polynomial time with respeet to the size of the equation and the 
total size of given LZ -eneodings. 



Theorem 34. Assume N is the size of minimal solution of a word equation of 
size n. Then eaeh solution of size N ean be LZ -eompressed to a string of size 
O (n^ log^ (N) (log n + log log N ) ) . 

As a direct consequence of Lemma 7 and Theorem 34 we have: 



Conjecture B implies conjecture A. 
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Theorem 35. Assume the length of all variables are given in binary by a fune- 
tion f. Then we ean test solvability in deterministie polynomial time, and produee 
polynomial- size LZ-eompression of the lexieographieally first solution (if there is 
any). 

8 Morphic Representations 

There is a classical (in terms of formal language theory) short description of 
strings in terms of morphisms. Assume w = where 0 is a morphism 

and a is a letter in the alphabet. Words of the type can be interpreted 

as instructions in a ’’turtle language” to draw fractal images, see [49]. Hence 
the calculation of the Tth letter has a practical meaning when computing a 
local structure of a fractal without computing the whole object. The pair {k, (j)) 
can be treated as short morphic description of w, denoted by morphic_desc{w) . 
The length n = \morphic_desc{w)\ of the description is the size of the binary 
representation of k and 0. 

For some morphisms the computation of the Tth letter could be especially 
simple. This is the case for the Thue-Morse morphism 

iF(0) = 01, T{1) = 10 

Assume we count positions starting from 0. Let bin{i) be the binary representa- 
tion of i. Then: 

'0^(0) = 1 bin{i) contains an odd number of ones 

For example the 1024-th position of is 1 since 6in(1024) = 10000000000 

contains odd number of ones. However such simple computation of the i-th letter 
does not apply to every morphism. 

Theorem 36. Let (p be a morphism. There is a polynomial time algorithm to 
eompute the i-th letter of (j)^ {a), where the input size is the total number of bits 
for i, (j), and k. 

Instead of morphic functions (j) we can use a finite-state transducer function 
Ayi where A is a finite state transducer (deterministic finite automaton with an 
output function). The replacement of by A a has dramatic consequences. 

Theorem 37. [56] Let A be a finite state transdueer. The problem of eomputing 
the i-th letter of \\{a) is EXPTIME-hard. 

9 Final Remarks 

We have surveyed complexity-theoretical results related to the processing of 
large compressed texts and arrays without decompression. However we have 
not discussed one important aspects: practicality of algorithms. Recently many 
related practical issues were investigated, see [32,33,45,46,47]. The whole area is 
in an initial stage. 
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Abstract. The World Wide Web offers a great deal of potential in sup- 
porting cross-platform cooperative work within locally dispersed working 
groups. GMD has developed the BSGW System, a Web based group- 
ware tool using the metaphor of shared virtual workspaces. The sys- 
tem is particularly useful -and already used by a large community - for 
cooperation between researchers in distributed environments. This paper 
describes the principles, architecture and functionality of the current 
(August 1999) version. 



1 Introduction 

Collaboration between researchers involves a rich set of modes and means of 
cooperation. For example, several researchers may meet spontaneously, e.g., at 
a conference, and discuss new research ideas. They may decide to write a joint 
paper, distribute the off-line drafting of different sections of the paper to indi- 
viduals, have face to face meetings to discuss the drafts, maybe with mutual 
reviews between meetings, until the final paper eventually emerges and is pre- 
sented to the scientific community. Depending on the area of research, besides 
textual communication additional media such as graphics, spreadsheets, anima- 
tions, presentation of software or the results of experiments will be involved in 
their cooperation. 

To enable efficient ways of cooperation, these collaboration processes need to 
be supported by electronic means, in particular, when cooperation takes place 
within locally dispersed groups. These electronic cooperation tools need to sup- 
port the usual work practices of researchers, in particular, they need to provide 

— a rich variety of tools for asynchronous and synchronous collaboration, 

— a smooth transition between asynchronous and synchronous modes of col- 
laboration, 

— a close integration into the normal working environments of the users, and 

— cross-platform interoperability, since in general cross-organisational research 
groups use a variety of platforms. 

In the last years the Internet and the World Wide Web (WWW) in particular 
have become the most important infrastructure for communication within the 
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research community. Email over the Internet has emerged as the primary means 
of interchanging multimedia information between researchers, and the WWW 
has become an important medium for dissemination of research results. The 
WWW has a number of advantages as the basis for tools to support collaborative 
information sharing: 

— WWW browsers are available for all important platforms and provide access 
to information in a platform independent manner. 

— Browsers offer a simple and consistent user interface across different plat- 
forms. 

— Browsers are already part of the computing environment in many organisa- 
tions. 

— Many organisations have also installed their own Web servers and are familiar 
with server maintenance. 

Given these characteristics, the extension of the Web to provide richer forms 
of cooperation support for working groups is both appropriate and desirable. 
Therefore, the CSCW (Computer Supported Cooperative Work) research group 
in GMD’s Institute for Applied Information Technology (FIT) has developed the 
BSCW (Basic Support for Cooperative Work) system within the last five years 
which as its main goal seeks to transform the Web from a primarily passive 
information repository to an active cooperation medium. 

2 General Approach of the BSCW System 

Over the last years, CSCW research has led to a better understanding how to 
support electronic cooperation within groups in various environments. Empirical 
studies have shown (see e.g. [3]) the importance of joint information spaces (often 
called shared workspaces) particularly in locally distributed, loosely organised 
groups. The groups use such workspaces for the collection and structuring of any 
kind of information they need (e.g., documents, graphics, spreadsheets, tables, 
or software) to achieve the goals of their collaboration. 

Such workspaces support primarily asynchronous modes of communica- 
tion. This mode is normally the most important one for cooperation between 
researchers since in such an environment cooperation consists often in parallel, 
loosely coupled activities of the individual group members. Synchronous types of 
cooperation such as audio / video conferencing or chat sessions are usually of less 
importance but should also be supported to some extent. The usage of workflow 
systems - which are primarily addressing the execution of a set of tasks following 
a predefined sequence with allocation of responsibilities to persons or roles -is 
normally not appropriate in such groups. 

The BSCW system is based on the metaphor of shared workspaces. The users 
access these workspaces with their normal Web browsers; the installation of addi- 
tional software at the users’ sites is not necessary. A further focus of the system 
is the information of the users about the activities within their workspaces, i.e., 
the system provides several awareness services. 
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Although the system primarily supports asynchronous modes of communica- 
tion, it also provides some features for synchronous collaboration such as infor- 
mation about the concurrent presence of other users as well as interfaces to 
synchronous communication tools such as chat or audio/video conferencing. 

3 Implementation of the BSCW System 

The BSCW system is built upon a standard Web server: The Common Gateway 
Interface (CGI) -the standard API for Web servers -is taken to implement the 
BSGW kernel functionality, thereby extending a Web server into a BSCW server. 
The system is written entirely in the interpreted programming language Python 
(see http:/ / www.python.org/) and the only additional software required to use 
the system besides a Web server is the Python interpreter. 

Since Python provides good support for modularisation, the implementation 
of the kernel functionality and the user interface are largely separated, i.e., with- 
out modifications of the kernel code the interface can be customised to a large 
extent, even by people without detailed understanding of the code. The interface 
definition comprises a set of HTML template pages which can be edited easily. 
GMD provides these interface template pages in German and English, but users 
of the system have translated them to provide interfaces in additional languages 
(e.g., French, Italian, Spanish, Finnish, Russian). 

The modular system design also allows extension of BSGW in a number of 
different ways rather easily. New operation handlers can be added to provide 
new functionality or act as interfaces (“wrappers’)’ to an existing application. 
It is also straightforward to access the persistent store of the BSGW system to 
store new kinds of objects without modifying the storage routines themselves. In 
particular, the choice of the interpreted language Python as the implementation 
language directly supports rapid prototyping. 

An overview of the architecture of the BSGW system is given in Fig. 1. The 
main interface between the BSGW Server and the BSGW clients -these are nor- 
mal Web browsers -is HTTP and HTML. Since HTML is not very powerful with 
respect to interface design, the system contains also an additional Java based 
interface (using XML) which has been released with version 3.3 of the system in 
June 1999. (This interface is described below in more detail; see also [4].) 

Besides the BSGW server, i.e., a Web server extended with the BSGW func- 
tionality, the BSGW system comprises also a so-called event server which feeds 
the so-called monitor applet -di Java applet which can be started from a BSGW 
workspace - with events about presence and activities of other BSGW users (see 
below). This is a separate server whose functionality cannot be added to a nor- 
mal Web server since HTTP and HTML are unsufhcient for these particular 
features. 

The BSGW system runs on Windows NT and various Unix dialects (including 
Sun Solaris and Linux). As the underlying Web server the Microsoft Internet 
Information Server, the Apache server and the AOL and GERN Web servers 
can be used. 
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Fig. 1. Architecture of the BSCW System 

4 Functionality of the BSCW System 

As mentioned already above, the central metaphor of the BSCW system is the 
shared workspace. In general a BSCW server will manage workspaces for differ- 
ent groups. Users may be members of several workspaces (e.g., one workspace 
corresponding to each project a user is involved in). In addition, users may set 
up private workspaces which they do not share with others. 

A shared workspace can contain different kinds of information such as doc- 
uments, pictures, URL links to other Web pages or FTP sites, threaded discus- 
sions, information about other users and more. The contents of the workspaces 
are usually arranged in a folder hierarchy based on structuring principles agreed 
upon by the members of a workspace. 

A cooperative system has to provide awareness information to allow users to 
coordinate their work. The event services of the BSCW system provides users 
with information on the activities of other users, with respect to the objects 
within a shared workspace. 

Events are triggered whenever a user performs an action in a workspace such 
as uploading a new document, downloading (’reading’) an existing document, 
renaming a document and so on. The system records the events and presents 
these events to each user in various forms, e.g., as event icons attached to the 
objects, via email or through messages in the monitor applet (see below). 

The most common way of informing users about events is through the event 
icons attached to objects. Such an icon indicates that a particular event has 
occurred recently. Recent in this context means events which have occurred for 
an object since the user last carried out a catch-up action, an operation by which 
users can tell the system that they are aware of the events that have occurred 
so far and no longer wish to see them (i.e., their event icons) in the workspace. 
Events can be caught up at different levels, from individual objects to complete 
workspace folder hierarchies. Event histories are, of course, personal to each 
particular user, e.g., an event may be new to one user but old to other users. 
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The system distinguishes five types of events which are represented by five 
different event icons accordingly: 

— New events indicate that an object has been created since the user last 
caught up. 

— Read events show that an object has been downloaded or read by someone. 

— Change events indicate that an object has been modified. This category 
includes several event types, such as edited^ renamed^ and so on. 

— Move events show that an object has changed its location. This category 
includes delete and undelete events (showing the object has been moved into 
or out of a wastebasket) and eut and drop events (showing the object has 
been moved into or out of a user’s personal bag. 

— Toueh events are displayed for a container such as a folder to show that 
something has happened to an object contained inside (either directly or 
lower down in the folder hierarchy). 

Each event entry describes what was done, when and by whom. Although 
this approach for providing group awareness seems very simple at first sight, 
information such as “User A uploaded a new version of document X” , or “User 
B has read document Y” is often very useful for group members in coordinat- 
ing their work and gaining an overview of what has happened since they last 
contacted the BSCW server. 

Furthermore, the system contains the following main features: 

— Authentieation: Users have to identify themselves by name and password 
before they have access to BSCW workspaces. 

— Version management and loeking: Documents within a workspace can be put 
under version control which is particular useful for joint document produc- 
tion, or they may be locked during an editing session to prevent other users 
from accessing documents temporarily. 

— Diseussion forums: Users may start a discussion on any topic they like and 
the system presents the threads in a style similar to the Internet newsgroups. 

— Aeeess rights: The system contains a sophisticated access rights model which 
allows, for example, that some users may have complete control over an 
object in a workspace whereas others have only read access or no access at 
all. 

— Seareh faeilities: Users can specify queries to find objects within BSCW 
workspaces based on names, content or specific properties such as document 
author or document modification date. Furthermore, queries may be sub- 
mitted to Web search engines and the result of the query can be imported 
into workspaces. 

— Sorting: As mentioned above, objects in a BSCW workspace can be ordered 
in a hierarchical structure according to the user requirements. Within a 
folder listing users may sort the objects according to several categories such 
as type, name, or date. 

— Doeument format eonversion: These facilities allow users to transform a doc- 
ument into their format of choice, e.g., a proprietary document format into 
HTML, before downloading it. 
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Annotation and rating: Users may add notes (meta-information) to objects 
in a workspace and rate the quality of objects, e.g., of documents or URLs 
that have been created. When several users gave their rating, the system will 
compute a median value out of their individual ratings. 

Upload and download of archives: Rather than uploading documents one by 
one into a BSCW workspace, users may upload an archive such as a zip 
or tar file and extract the archive at the server which may reduce upload 
times significantly. Similarly, users may create an archive containing objects 
of a workspace and then download the archive instead of the individual 
objects. 

Email integration: Users may easily send email to other users of a BSCW 
server and can distribute documents in a BSCW workspace to specified recip- 
ients via email. 

Special support for meetings: The systems allows the creation of so-called 
meeting objects which are particularly useful for the preparation of meet- 
ings since they include features such as selection of participants, automatic 
invitation of participants who may accept or decline an invitation, or the 
distribution of meeting notifications via email. 

Interface to synchronous communication: Through this interface users can 
specify synchronous sessions and launch respective tools, e.g., audio/video 
conferencing software or shared whiteboard applications. 

Anonymous access and moderation: Anonymous access can be allowed to 
individual objects or complete folders, e.g., for publishing documents after 
they have been developed within a closed group. The access to public folders 
can be set up in such a way that users can upload documents anonymously 
but that they only become visible to others after they have been approved 
by a moderator. 

Address book and calendar. Besides the waste basket and the bag there are 
two further objects which are personal to each user: the address book where 
a user may collect the names of other (e.g., frequently contacted) users, and 
the calendar which contains the dates of all meeting objects related to the 
user. 

Customisation: Through user preferences the users can modify the system 
interface to some extent, e.g., whether or not they want to use an Javascript 
or ActiveX enhanced interface, and which functions they want to have avail- 
able in the user interface (see also below). 

Multi-language support: The interface of the system can be tailored to a 
particular language by straight-forward extensions. Several languages (e.g., 
French, Italian, Spanish, Catalan) have been created by users of the system 
and are publicly available. Each user may select his or her preferred interface 
language. 

Administration and configuration: For administrators of a BSCW server 
there exists a convenient HTML interface for system administration, e.g., 
configuration of the server or user management. A BSCW server is highly 
configurable through a set of configuration files which tailor the user inter- 
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face of the system to particular requirements, e.g., the set of functionality 
which shall be accessible for the users. 

The system comprises a rich set of functions, most of which had been intro- 
duced because of user requests (see below). Many features may not be needed 
by all users, e.g., because they are rather specific and may, for example, only be 
of interest to workspace administrators. Therefore, the system supports the con- 
cept of user profiles. Users can choose between a Beginner^ Advanced and Expert 
profile. In the Beginner profile only a subset of the functionality is visible in the 
interface which reduces the number of buttons and makes it thereby easier to 
comprehend for novice users. In the Advanced profile, which a user might select 
after he or she has become familiar with the system, the functionality is increased 
and more buttons appear in the interface accordingly. In the Expert profile the 
full functionality of the system is available, often only “one mouse click away” , 
but on the expense of a rather complicated interface. Furthermore, each user 
may create his or her own interface by starting with one of the three predefined 
profiles and then adding or removing buttons to the particular requirements of 
the respective user. (More details are given in [1].) 

In addition, there are a number of other tools contained in the BSCW system, 
for example, so-called uploaders, applications that transfer a file from a local 
file store into a particular location on a BSCW server. These tools provide more 
support for file uploading than is currently built into Web browsers. For example, 
they allow multiple file transfer or drag-and-drop uploading. 

Figure 2 is an example of the user interface of the BSCW system. It shows a 
listing of the folder “SOFSEM 2000” for user “Bauhmann” . The folder contains 
two sub-folders (“Conference Proceedings” and “Submitted Papers”), a link to 
another Web page (“SOFSEM ’99 Home Page”), a text document (“Important 
dates for ... ”) and a discussion object (“Shall we extend ...”). The icon in front 
of each object’s name indicates the type of the object. Behind each object is the 
name of the person who created the object and the date when it was created or 
most recently modified. 

At the top of the screen there are buttons for triggering operations such 
as “Add Member” to provide access to this folder to other persons, or “Add 
Document”, “Add Bolder”, “Add URL”, etc., to create new objects within the 
folder. Other actions such as “Catch up”, “Send”, “Rate” or “Copy” can be 
applied simultaneously to a group of objects which have been marked through 
the tick boxes in front of each object’s name. Eurther action buttons appear in 
aline below each object (e.g., “Modify”, “Verify”, “Eetch”, “Add Note”, “Edit” 
or “Replace”) since they are only applicable to one particular object. 

Behind four objects (“Conference Proceedings”, “Submitted Papers”, “Im- 
portant dates for . . . ” , and “Shall we extend . . . ” ) there are event icons which 
indicate that events occurred recently, e.g., the objects “Important dates for . . . ” 
and “Shall we extend ...” are new for user Bauhmann, the document “Important 
dates for . . . ” and some other document (s) within the folder “Submitted Papers” 
have been read and there are some further changes in the folders “Conference 
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Fig. 2. HTML user interface to a BSCW shared workspace 



Proceedings” and “Submitted Papers” . Clicking on these event icons would give 
more details about the event, e.g., which user(s) caused the respective events. 

Figure 3 gives another example of the user interface. Here a form is shown 
which the user has to fill in when he or she wants to upload a document into 
a BSCW workspace. The user has to select the file from the local file sys- 
tem (/home/appelt/SOFSEM/sof sem.tex), may specify a different name for the 
document on the BSCW server (“WWW based collaboration with BSCW”), 
and may also add some additional information to the document (“Figures will 
be ...”). If the Web browser is not able to determine the MIME type of the 
document correctly, the user may set the MIME type explicitly and specify an 
encoding, if applicable. 

The HTML based listing of the content of folders may look a bit unusual to 
novice users of the system but is mainly caused by the limitations of HTTP and 
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Fig. 3. HTML form for uploading a document 



HTML. The majority of users is today familiar with modern desktop graphical 
users interfaces such as Microsoft Windows or Macintosh OS. These interfaces 
contain also features such as drag-and-drop and pop-up menus that appear when 
clicking, e.g., on the mouse buttons. Unfortunately, the Web browser interfaces 
are not really graphical but primarily text oriented interfaces because the origin 
of HTML is mark-up of text. 

To provide a more convenient and more familiar interface to the BSCW sys- 
tem, we therefore developed a Java applet which provides an additional BSCW 
interface. A user may launch this applet from his or her BSCW start page and 
access the full functionality of the system starting from this interface. At present, 
the functionality of the applet is primarily focussed on browsing through folder 
hierarchies in BSCW workspaces and the traditional HTML interface is still 
deployed for a larger number of operations (e.g., for filling the respective data 
into the form sheet used for uploading a document or creating a new link object), 
i.e., only a subset of the BSCW functionality is currently fully integrated in the 
Java applet. 

Figure 4 gives an example of the Java based interface showing essentially the 
same content as Figure 2. This interface looks much less cluttered as the HTML 
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Fig. 4. Java browser interface to a BSCW shared workspace 



interface because most of the action icons and buttons have been moved to pop- 
up menus which appear when clicking on mouse buttons. As Figure 4 shows, 
clicking the right mouse button when pointing to a document displays a menu 
with the operations applicable to documents (“Get”, “Rename”, . . . , “Delete”). 
The list of operations in the menus are filtered against the access rights of the 
user, i.e., only those operations are shown which are allowed for the respective 
user. 

Although primarily focussed on asynchronous modes of cooperation, BSCW 
provides also some features to enable synchronous communication, e.g., the mon- 
itor applet mentioned above which is connected to the monitor server (see Fig- 
ure 1). BSCW server and monitor server communicate which each other: the 
BSCW server informs the monitor server about the events which the monitor 
server then distributes to the respective Java applets (see [5] for details). 

The monitor applet includes the following features: When a user starts this 
applet, it will show other users who have also launched the applet and who are 
included in the user’s personal address book. (It is assumed that users belonging 
to the same group -i.e., those who cooperate in some way -include each other 
into their address books.) The applet can also indicate the activities of users 
visible in the applet. Furthermore, it can be used to start a chat session or send 
a message to other users. 

Figure 5 shows an example of the monitor applet with three windows where 
one window shows that the users Bauhmann and Appelt are currently working 
with the BSCW server. The second windows shows their activities and the third 
window has been launched by Bauhmann for a chat session with Appelt. 
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Fig. 5. Monitor Applet 



5 Usage of the BSCW System 

At a very early stage, we decided to test our ideas and developments in a real 
world setting: When the first version of the system was ready in October 1995, we 
made it publicly available on one of GMD’s Web servers (http://bscw.gmd.de/) 
and invited all interested people to use the system for group cooperation. Fur- 
thermore, we also made the code of the system available for download so inter- 
ested parties could install their own BSCW server. (Licences for the System 
are now available from OrbiTeam Software GmbH, a spin-off company founded 
in 1998. Schools and universities can usually receive a royalty free license for 
educational purposes.) 

In fact, we attracted several hundred users within a few weeks and soon 
received some quite considerable number of emails with respect to feedback 
on problems that users had with the system or improvements and extensions 
they wanted. Therefore, we decided that the future development of our BSCW 
system should be informed to a large extend by this feedback since this seemed 
a very promising approach to get a high acceptance from our user community. 
Since 1996, some additional funding for the development has been received from 
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the European Commission’s Telematics Applications Programme through the 
Coop WWW (1996/1997) and CESAR (1998/1999) projects. 

Version 2.0 of the system was released in August 1996, version 3.0 in June 
1997, and the most recent version 3.3 in June 1999. At present (August 1999) 
there are over 20,000 registered users at GMD’s public BSCW server. On an aver- 
age day, there are about 30,000 requests from users resulting in a data transfer 
of about 300 Megabyte. 

The BSCW server software has been downloaded over several thousand times 
and we know of several hundred operating BSCW servers all over the world - 
many of them at universities - so we estimate that there exist several ten thou- 
sand BSCW users world wide. 

A systematic evaluation of the usage of the BSCW system has not been 
carried out and is probably impossible considering the large number of users. 
We know, however, a number of different application areas where the system is 
used, e.g., project management in large projects with members from different 
organisations, conference organisation including handling of the paper review 
process, teleteaching applications including also some cases where teachers and 
students were located in different countries, and electronic support activities 
between large telecommunication companies with their customers. Two examples 
of BSCW usage in a university environment are described in [6] and [2]. 



6 Related Systems 

When the first version of BSCW was released in 1995, it was the first fully Web 
based groupware system. At that time, other (commercial) groupware systems 
were based on private protocols and formats, requiring major software installa- 
tion and maintenance. In 1996, BSCW won the European Software Innovation 
Award (ESIP ’96) for its new approach in groupware developments. 

Within the last few years, however, a number of other groupware systems have 
emerged which are based on Internet and Web technology. This includes systems 
which have been developed from scratch such as Hyperwave [7] or Livelink [8], 
but also systems which have replaced - more or less thoroughly -their private 
protocol and format by open standards such as the most recent versions of Lotus 
Domino [9]. 

We believe that BSCW is still one of the leading systems for collaboration 
support. Its strength is surely based on the large feedback from its user commu- 
nity which contributed much to the current status of the system. 

7 Conclusions 

The BSCW shared workspace system is a Web-based CSCW tool offering a wide 
range of features to support collaboration. In particular, the system is considered 
a very useful tool for cooperation in locally dispersed, cross-organisational groups 
using different system platforms. 



78 Wolfgang Appelt 



References 

1. Appelt, W., Hinrichs, E., Woetzel, G.: Effectiveness and Efficiency: The Need for 
Tailorable User Interfaces on the Web; Proceedings of the 7^^ International WWW 
Conference, Brisbane, 1998. 72 

2. Appelt, W., Mambrey, P.: Experiences with the BSCW Shared Workspace System 
as the Backbone of a Virtual Learning Environment for Students. Proceedings of 
the World Conference on Educational Multimedia, Hypermedia and Telecommuni- 
cations ED-MEDIA 99, Seattle, 1999. 77 

3. Gorton, I., Hawryszkiewycz, I., Eung, L.: Enabling Software Shift Work with Group- 
ware: A Case Study; Proceedings of the 27^^ Hawaii International Conference on 
System Science, IEEE Computer Society Press, 1996. 67 

4. Koch, Th.: XML in practice: the groupware case. Proceedings of IEEE WET ICE 
Workshop 1999, Stanford University. 68 

5. Trevor, J., Koch, Th., Woetzel, G.: MetaWeb: Bringing synchronous groupware 
to the World Wide Web. Proceedings of the European Conference on Computer 
Supported Cooperative Work (ECSCW’97). Kluwer Academic Publishers, 1997. 75 

6. Vliem, M. E., Using the Internet in university education - The application of BSCW 
within student projects. Report of the Ergonomics Group, University of Twente, 
Enschede, 1997. 77 

7. http://www.hyperwave.com 77 

8. http://www.opentext.com/livelink 77 

9. http://www.lotus.com 77 



Middleware and Quality of Service 



Christian Bac, Guy Bernard, Didier Le Tien, and Olivier Villin 



Institut National des Telecommunications 
9 rue Charles Fourier 91011 Evry Cedex, FRANCE 
{chris jbernardjletien, villin}@etna. int-evry . fr 



Abstract. This presentation is a tutorial and a state of the art in Qual- 
ity of Service and support for Quality of Service in middleware. It is 
organized as follow: 

— Eirst, we briefly introduce the problematic of Quality of Service and 
middleware. 

— The second part, presents concepts handled in Quality of Service to 
support distributed applications. 

— The third part, shortly presents middleware concepts and how Qual- 
ity of Service is added in existing middleware implementation. It 
explains which concepts are common to these implementations. 

— Einally we summerize what is already done in term of QoS Support in 
middleware, what is expected to come, and what are the unresolved 
problems that need more work to overcome. 



1 Introduction 

This presentation is a tutorial and a state of the art in Quality of Service (QoS) 
and support for Quality of Service in middleware. There are four parts in the 
speech: 

— In the first part, we briefly introduce the problematic of Quality of Service 
and middleware. First we show what are the basic needs of QoS Systems 
and why a middleware is not well structured to respond to these needs. 
Then we explain why we want to implement QoS demanding applications in 
a distributed Object Oriented System. 

— In the second part, we present concepts handled in Quality of Service to 
support distributed applications. 

— In the third part, we shortly present middleware concepts and how Quality 
of Service is added in existing middleware implementation. 

— Finally we conclude summarizing what is already done in term of QoS Sup- 
port in middleware, what is expected to come, and what are the unresolved 
problems that need more work to overcome. 

1.1 Some Key Concepts 

Let’s jump into the problem first and try to analyze it in deep afterward. We 
try and set up a distributed system based on middleware that is able to support 
quality of service constraints. What does this mean: 
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1. Middleware: is a software architecture to support distributed applications. 
The main characteristics of middleware are that: 

— middleware is designed to ease the building of distributed application, 
specially for client-server programming style. 

— an application in a middleware is defined in terms of functionalities; 

— basic interactions between a client and a server rely on remote invocation 
mechanism. Middleware comes with a Remote Procedure Call package 
(RPC), and its capacities depends on this package. This means that 
usual interaction in middleware is synchronous. 

— middleware provides transparency, this means that a client ignores the 
server localization when it uses it. In the same idea, middleware allows 
architecture independence between clients and servers; 

— each middleware is associated with a programming style that is more or 
less object oriented. For example, CORBA and JAVA/RMI are object 
oriented, DCE is not. 

2. QoS: is the evaluation of non functional requirements. These requirements 
are associated to the relation between the user and the computer. QoS is 
linked to applications that carry complex informations that may be related 
to images or sounds, these kind of applications are usually called multimedia 
applications. This means that these requirements are subjective for many of 
them and also that they are tighten to the way human perception works. For 
example, the sound is much strict on the transmission delay than the image. 
These non functional requirements are described with parameters: 

— that can be temporal constraints for example the elapsed time between 
the moment when you hit a button and the moment when the corre- 
sponding action is undertaken by the system. 

— that may be some supportable error rate for example how many words 
can you miss in a sentence that does not prevent you of understanding 
the meaning of this sentence; 

— in the ISO terminologies, QoS is “The global performance effect on ser- 
vice that determines the degree of a user, that uses this service, satisfac- 
tion” . 

1.2 Why Mix Middleware and QoS? 

To summarize: 

— QoS means resources management that includes resources localization, 
resource usage negotiation and control; 

— Middleware allows functional execution for an application on an Object Ori- 
ented distributed platform; 

— QoS and Middleware: middleware is a good support to develop distributed 
applications. Why wont we use it to develop distributed multimedia appli- 
cations? If we do, we need a QoS aware middleware. 

Now that we have shortly described the theme, lets investigate in deeper what 

Quality of Service means and how it is usually added in distributed systems. 
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2 QoS in Distributed Multimedia Systems 

This section describes fundamentals in QoS. 



2.1 Layered Model 

As shown in figure 1, a distributed QoS system must support QoS at different 
layers [14]: 

user: this level allows the user to specify global parameters such as good quality, 
standard quality or low quality. It can be associated with a graphical user 
interface; 

application: the application must be able to quantify the QoS specified by the 
user and negotiate parameters with the system. It uses the user specification 
to calculate data size, rate and error. For example, it translate the “good 
image” selection in full color, 300 x 300 pixels, 30 images per seconds, 
system: when the system negotiates the quality of service with the applica- 
tion, it translates the application parameters in values for the resources it 
manages. This is called the resources mapping. It can be split into two parts: 

— the requirements on the communication services; 

— the requirements on the operating system services on each site, 
device: the system is in charge of the allocation of the resources corresponding 

to devices that can be: 

— multimedia devices on the site; 

— and devices connected to the network. 




System QoS 




Device QoS Network QoS 



Multimedia device 



Network 



Fig. 1. Layered model 
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Figure 2 shows that resources management must be included at each level 
using a resource management protocol and negotiating the necessary resources 
with the lower layer. In this figure, there is also a special case for the network 
layer that needs a QoS aware router. 



Host Host 




Fig. 2. Managing resources in a layered model 



2.2 Main Operations 

The main operations in a QoS system are: 

Reservation: that allows resources allocation, and multimedia stream setting 
according to the application’s QoS requirements; 

Distribution: of resources between the application which is the usual operating 
system work to control the resources sharing; 

Adaptation: despite the fact that resources are reserved, the system must con- 
trol resources usage during execution, because the global system is shared 
with non- QoS applications or because components do not behave according 
to their commitment. 



2.3 QoS Specification 

To allow QoS negotiation, resources reservation and monitoring, the QoS needs 
must be according to the following points: 

stream synchronization: describes the level of synchronization between rela- 
ted streams. The well known example is the lip-speech synchronization that 
is necessary to synchronize sounds and images in a video film, 
stream performance: is directly related to resources reservation because it 
gives a description in terms of throughput, delay, jitter and admissible error 
rate. 
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level of guaranty: specifies the capacity of the system to ensure the QoS nego- 
tiated (end to end). It is usually split into three levels: hard guaranty that 
means no contract violation, soft guaranty that means average contract com- 
pliance, and no-guaranty that you can also call best-effort. 

QoS policy is the description of the system’s behavior in case of a contract 
violation at execution time. 

service cost is the price to pay for the service. This is to put a brake on max- 
imal service demand! 



2.4 QoS Mechanisms 

To allow QoS support the following mechanisms must be completed by the sys- 
tem: 

QoS providing: which allows static resources management, at the stream ini- 
tiation. It is split into three actions: 

QoS mapping that translates the user QoS requirements into the different 
layers; 

admission test that tests the translated QoS requirements toward the 
available system resources ; 

reservation protocol that reserves the resources if the admission test is 
successful. This is usually an end to end allocation. 

QoS control: controls the data stream during the data transfer. Its main com- 
ponents are [3]: the flow scheduling that schedules the different actions 
according to the QoS accepted, the flow regulation that verifies that the 
data stream conforms to its specification, flow synchronization that con- 
trols the timing and ordering of operations and flow control that may allow 
the sender to adapt to the recipient speed (breaking the QoS reservation). 
QoS management: monitors and services the resources during the data trans- 
fer to respond to system modifications. It does the monitoring and servic- 
ing between layers. In case of QoS degradation it does the signaling, and 
tries to manage system dysfunctions and to adapt to continue execution. 



2.5 Communication Layers 

This is where the QoS support is the most advanced, there has been early work 
on providing QoS at the network and transport layer. There is also ongoing work 
on future protocols and QoS. 



2.5.1 Network Layer. The network layer must provide large bandwidth, 
and multicast delivery. It must allow resources reservation, and provide QoS 
guarantees and routing protocols to support streams. 

There has been a number of reservation protocols developed and also a 
proposition for an IP architecture that allows packet switching according to 
QoS classes: 



84 



Christian Bac et al. 



Stream Protocol Version 2 [23]: allows guaranteed service, is connexion ori- 
ented, and based on a stream model. This protocol makes resources reserva- 
tion at connexion time. 

Real-Time Internet Protocol [20]: specifies an unreliable datagram deliv- 
ery, that is connexion oriented with performance guaranty. It uses the Real- 
time Channel Administration Protocol to reserve resources. 

Integrated services [19]: specifies how to provide QoS on a per flow basis in 
the Internet. It can be mixed with RSVP to reserve resources along the data 
path. 

Differentiated services [8]: specifies a packet stamping method that allows 
packet switchers to manages priority queues according to the packet stamp; 
this protocol avoids resources reservation. 



2.5.2 Transport Layer. Transport layers have been designed to meet the 
needs of continuous media. The Esprit OSI 95 project proposed: 

1. an Enhanced Transport Service TPX [5], 

2. that is able to manage QoS parameters at connexion time like the through- 
put, delay, jitter and error control, 

3. this transport also allows QoS semantics like compulsory, threshold or max- 
imal quality). 

The Tenet group at UCB proposed a Real-time Message TP and Continuous 
Media TP [12]. 

The University of Lancaster also developed a Multimedia Enhanced Trans- 
port Service [21]. It uses an ATM network, and provides a communication service 
that is connexion oriented, and guarantees that packets are ordered, but that is 
not reliable. This transport service does resources allocation. 



2.6 QoS Architectures 

To allow QoS from application to application, research group proposed architec- 
tures that support QoS at the different layers for example: 

1. Integrated Multimedia Communication Architecture: from Nicolaou at the 
Cambridge University [15,16]. This architecture focussed on managing the 
multimedia data flow in an operating system. 

2. Quality of Service Architecture [4] from the Lancaster University, will be 
reviewed as an example in the next section, it supports the QoS properties 
for multimedia applications. 

3. Tenet UCB project developed a protocol family for wide ATM network [9]. 

4. and the HeiProject, in the IBM European Networking Center in Heidel- 
berg [10,25]. 
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2.7 Platform Example: QoS-A 

QoS-A (see figure 3) is a complete example of a QoS architecture that supports 
distributed multimedia applications. It allows QoS specification and multimedia 
communication in an object oriented environment. 

It exhibits the following key notions: 

flows is the data path from a source (data producing) to a sink (data consum- 

ing); 

service contracts is the binding between users and providers, 
flow management allows QoS control and servicing. 




Fig. 3. QoS-A Architecture 



The architecture is organized according to the following layers: 

Distributed System Platform allows execution of distributed applications 
with services to provide multimedia communications and QoS specification. 
Orchestration provides the synchronization between related multimedia data 
flows and jitter correction. 

Transport provides configurable services and mechanisms. 

Network is the base to an end to end support over an ATM network. 

The platform is not only structured by layers but also by planes. The protocol 
plane is split into two sub-planes the control plane is reliable, full duplex and slow 
and the user plane is unreliable, one way and wide band. The maintenance plane 
does the global management. It is in charge of the monitoring and servicing in 
each layer. The stream management provides an end-to-end admission control, 
is in charge of the QoS mapping, and the adaptation. 

QoS-A uses a Service Transport Contract that describes the commitment 
between the application and the system. This contract contains the following 
informations: 
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1. the QoS specification is described in terms of throughput, delay, jitter and 
error rate. 

2. the level of agreement allows some flexibility in the commitment between the 
two parties. It can be deterministic (firm requirements), adaptive (statistic) 
and best effort. 

3. the adaptation policy offers options like filtering and adaptation. It describes 
actions to be taken in case of contract violation. 

4. the level of monitoring and servicing. 

5. the reservation method that can be on demand, fast or advanced. 

6. the cost specifies the service price. 

2.8 Conclusion QoS 

To summarize the following points are mandatory to allow QoS control in a 

distributed multimedia environment: 

Qos Specification at the user level must be mapped in the different layers. 

Service Level specifies the system commitment to the user QoS. It is often: 
best effort, adaptive or guaranteed. 

QoS management must take place at different moment: 

— statie: at the admission control, it does the resources reservation and 
QoS mapping. 

— dynamie: during the data transmission, it does the monitoring, control, 
adaptation and servicing. 

Connection is the abstraction used to model a media stream. It describes a 
data path, and the associated resources reservation. 

3 Middleware 

In this section we shortly describe what a middleware is specially what is a 

CORE A environment. 

3.1 Characteristics 

A definition for middleware is: 

Distributed Environment for applications re-usability, 
portability and inter-operability. 

The common points of middleware are: 

— the use of an Interface Definition Language that allows the description of 
client and server interactions. The IDL is aimed at been language and imple- 
mentation independent. 

— every middleware is based on a Remote Procedure Call package that it uses 
for the interactions between clients and servers. The RPC package is also 
associated with a naming service that allows the communication to be loca- 
tion independent (i.e. client and server ignore the location of their relative). 
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— some middleware allow execution control, for example DCE comes with its 
threads package. 

— middleware add specifics functions that relies on the RPC and naming ser- 
vices, for example CORBA defines trading and persistancy services. DCE 
defines a file system service. 

The rest of this presentation is mainly based on CORBA due to the fact that 
the most active middleware and QoS community acts in CORBA environments. 



3.2 OMA Architecture 

CORBA is based on a general architecture called the Object Management Archi- 
tecture [24]. In the OMA, an object is an entity that can be accessed through 
an interface. This architecture, see figure 4, is based on: 

the Object Request Broker (ORB) that enables communication between 
clients and objects; 

the Object Services offers interfaces that are of general purpose usage to cre- 
ate services, for example the trading service and the naming service are 
object services; 

the Common Facilities this facilities are also of general purpose usage but 
are more oriented toward end-user applications; 
the Domain Interfaces these interfaces completes the Object Services and 
Common Eacilities but for a specific domain for example Product Data Man- 
agement or Telecommunications; 

the Application Interfaces are specifically developed for an application. 



Applications 

Interfaces 



Domain 

Specifics 

Interfaces 



Common 

Facilities 



Object Request Broker 



Object Services 



Fig. 4. OMA Architecture 
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3.3 CORBA Architecture 

The Common Object Request Broker Architecture (CORBA) [18] details the 
interfaces and characteristics of the ORB in the OMA. The main features shown 
in figure 5 of CORBA 2 are: 

— ORB Core 

— OMG IDL 

— Interface Repository 

— Language Mappings 

— Stubs and Skeletons 

— Dynamic Invocation and Dispatch 

— Object Adapters 

— Inter-ORB Protocols. 




Fig. 5. ORB Architecture 



The component that is of great interest in middleware and QoS is the ORB 
core, that we describe in the next section. 



3.4 ORB Functionalities 

The ORB delivers requests to objects and return responses to the client making 
the request. To achieve transparency, an ORB hides the following: 

— Object location: The client does not know where the target object resides. 

— Object implementation: The client does not know how the target object is 
implemented. 

— Object execution state: The client does not need to know whether the target 
object is already started or not. 

— Object communication mechanism: The client does not know what commu- 
nication mechanism the ORB uses to deliver the request and return the 
reply. 

As already stated, the fact that the ORB is hiding so many mechanism is a 
brake to the support of QoS in the CORBA architecture, we will now look at 
some architectures that try to support QoS in CORBA. 
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4 QoS CORBA Architectures 

The following research group are working on providing QoS in a middleware 
environment based on CORBA. 

— the Lancaster University that works on middleware and reflexivity [2]; 

— the BBN Systems and Technologies laboratories that tries to provide a QoS 
middleware by federating research projects in the USA [26]; 

— the Rhode Island University, that works on Real Time [27]; 

— the ReTINA consortium that develops a distributed processing environment 
for telecommunications [6]; 

— the Center for Distributed Object Computing G. Washington University 
in St. Louis Missouri that we will present as an example of middleware 
providing QoS support. 

4.1 Example: TAO 

Figure 6 shows the TAO [22] architecture as an example of what must be done 
to CORBA to support real-time invocations and QoS support. 




Fig. 6. TAO Architecture 



This implementation uses an ATM network and real time operating systems. 
The architecture modifications includes: 

1. speedup in the stub and skeleton; 

2. usage of an ORB based on specific communication mechanisms called ACE; 

3. Taylor ed specific ORB interface to support QoS; 

4. special real-time inter ORB protocol to communicate with the other sites; 

5. real-time object adaptor that is priority aware. 

The following subsections gives some comparisons about different research 
projects that try to mix QoS and CORBA. 
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4.2 Comparison Tables 

Table 1 mentions what is the main goal of the project and states whether it 
seemed necessary to modify the IDL to express the QoS constraints in this 
project. We can deduce from this table that even if the OMG states that it wont 
allow IDL modification there is a need for this modification at least to express 
some kind of real-time constraints. 



Table 1. IDL modification 



Project 


Aims 


extension IDL 


RHODES 


Database et CORBA Real Time 


yes 


TAO 


real time 


yes 


ReTINA 


interactive multimedia services 


yes 


QuO 


objects QoS 


yes 


Lancaster 


multimedia communication 


yes 



Table 2. ORB extensions in Corba RT projects 



Project 


ORB Modif. 


Services 


RHODES 


no 


priority and globals times, 
concurrency control, events 


TAO 


RT 


RT scheduler, 




RIOP 


events 


ReTINA 


telecom ORB 


info, telecom 


QuO 


not finished 


QoS delegation 


Lancaster 


yes 


interface type verification 
events, 

QoS manager, QoS mapper 



Table 2 shows whether it was necessary to modify the ORB to support real- 
time and QoS constraints. It also show what new services have been added. 
This table shows that it seems obvious that modifications in the ORB core are 
necessary to cover a wide range of QoS support. Some QoS support may be 
achieved using only new services. The first service for QoS and CORBA is an 
event service that is present in nearly all the implementations. 

Table 3 shows if the architecture allows some kind of resources specification 
and what kind of operating system and network they use. Conclusions from 
this table is that we need to specify the resources that an application needs, 
we are to use a real-time operating system and a network that allows resources 
reservations (ATM for now). 
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Table 3. Point of vue system and network 



Architecture 


Resources 

Specification 


Operating 

System 


Network 

type 


RHODES 


no 


RT 


- 


TAO 


yes 


RT 


ATM 


ReTINA 


yes 


RT 


multi- 

protocols 


QuO 


yes 


RT 


- 


Lancaster 


yes 


Chorus 


ATM 



Table 4 exhibits some noteworthy notions in the different architectures. 
Table 4. Noteworthy Notions 



Architecture 


Notions 


RHODES 


Timed Dynamic Method Invocation, distributed 
RT scheduling 


TAO 


GIGABIT sub-system, quick marshaling 


ReTINA 


binding mechanism 


QuO 


QDL, delegate object, QoS region 


Lancaster 


EDL, Query Language, event, binding object 



4.3 Bases for a QoS CORBA 

This different architectures teach us on what are the fundamental bases to 
develop a QoS CORBA architecture: 

— The first modification must be at the language level, we must include QoS 
control at the IDL level, either by modifying the OMG IDL or by adding some 
kind of QoS Description Language that is used to specify the application and 
the object server interface. 

— The second modification is at the ORB level so that it becomes a QoS aware 
middleware. 

— The third modification seems to adapt the middleware to fast networks so 
that the middleware is able to use the potential bandwidth. This means 
usually that we must find some light mechanism like Lightweight RPC to 
speed up the marshaling and un- marshaling, and avoid data copies through 
the different layers. 

— A QoS middleware needs be able to schedule the activities with some kind 
of real-time algorithm so it must at least execute on a real-time operating 
system. 
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— A QoS middleware is only manageable over a Quality of service aware net- 
work and nowadays this means use ATM network. 

Lets add some design principles to this bases. 

— The QoS part of the architecture must be Object Oriented to favor porta- 
bility, and give an uniform access to applications. 

— The QoS negotiation must take place through the ORB and the ORB must 
be QoS aware so that it must reserve necessary resources. 

5 Conclusion on QoS and Middleware 

To summarize: middleware separates applications from resources and to support 
QoS one needs relations between applications and resources. For an application 
deployed on a middleware access to system objects hard, and the layered vision 
is functional so it is hard to express QoS needs. 

The current practice exhibits two different types of solutions: 

1. research teams that work in the spirit of the Open Distributed Process- 
ing [17]: 

(a) ODP offers meta solution to the problem for example the ODP binding 
objects, streams control. Try an map these solutions to your middleware. 

(b) Lancaster university [2]: proposes to introduce planes and reactivity in 
the ORB. Extends a micro-kernel OS to support and ORB that imple- 
ments ODP bindings for multimedia streams; 

(c) DIMMA [6] uses the ANSA [1] ORB and adds an ODP library and 
streams support; 

(d) Jonathan [7]: is an ORB written in JAVA that includes reactive pro- 
gramming. 

2. pure CORBA solutions are developed by teams that where not formally 
involved in ODP: 

(a) OMG RFP 97-05-04 proposes to manage streams through an ORB but 
to use a separate connection [13] to create the multimedia data path. 
The idea is that one must not stress the ORB that is not designed to 
carry a data stream. 

(b) QuO [26] proposes some adaptation mechanisms and a quality level man- 
agement. The ORB implementation is not yet finished. 

(c) TAO [22] develop a fast ORB with tasks scheduling related to messages 
priorities. To allow multimedia support they added off line scheduling 
(i.e. admission test), and on line scheduling (i.e. execution control). 
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5.1 Future Work 

For now the following points are either not consensual or not enough worked: 

1. there still has no norm complete enough for describing every type of QoS, 

(a) the network level is the most advanced there are well known QoS [11] 
parameters and support in existing protocols (ATM, RTF); 

(b) the debate on whether to create the virtual channel to support streams 
in or out of middleware is not finished; 

(c) at the ORB level there is no consensus on how an application must 
express its QoS needs; 

2. the interactions between the middleware and the system are not enough 
explored; QoS middleware maps the QoS constraints to a fixed priority they 
give to some kind of threads (usually POSIX compliant). The most advanced 
project on this is the Nemesis kernel; 

3. on other middleware: there is no publication. 
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Abstract. Most current support for dynamic reconfiguration assumes 
that component interfaces specify input and output channels. Compo- 
nent models such as CORBA, however, support a client-server architec- 
ture, where component interfaces describe only the offered services. This 
work discusses the use of an interpreted language as a tool for dynamic 
configuration of distributed applications using CORBA components. We 
describe LuaOrb, a system based on the CORBA Dynamic Invocation 
Interface (DII) and the Dynamic Skeleton Interface (DSI), which pro- 
vides Lua programs with easy access to CORBA servers and allows these 
servers to be dynamically modified. Using LuaOrb, the Lua console itself 
becomes a tool for reconfiguration. LuaOrb uses a structural sub-typing 
model, so that only correctly typed connections are accepted. We also 
discuss possible forms for prescribing a reconfiguration, and their relation 
to LuaOrb. 



1 Introduction 

Component-based programming has been receiving a lot of attention and is fre- 
quently considered the successor of object oriented programming. A fundamen- 
tal point in the concept of component is the separation between interface and 
implementation. To use a component, it is necessary only to know its interface 
— implementations need not and, in general, should not be public. This allows 
client programs to remain valid even when an implementation of an invoked 
component is substituted for a new version. 

CORBA is the most widely accepted component model for open systems, 
with several commercial and academic implementations [20]. However, not much 
attention have been given to the problem of managing change in CORBA based 
applications. The need for dynamically evolving applications using CORBA is 
becoming increasingly apparent, specially in areas such as network management 
and other real time control systems. Fault tolerance and the dynamic integration 
of newly available services are important goals in these areas, as is the avoidance 
of service interruption. 

In CORBA, interfaces are specified using an IDL (Interface Definition Lan- 
guage). Specifications written in IDL are typically compiled into stubs (in the 
client side) and skeleton programs (in the server side). Modifications in the 
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structure of the application (client-side) or in the implementation of components 
(server-side) require re-compilation and service interruption. 

In this work, we discuss how an interpreted language can be used to add 
flexibility to CORBA-based programming. We present a set of tools which use 
alternative, dynamic mechanisms for client and server implementation and dis- 
cuss how these tools can be used to support dynamically evolving applications. 
Tools and examples are based on the extension language Lua and its binding to 
CORBA, called LuaOrb. 

In the next section, we define interpreted languages, and discuss their advan- 
tages and disadvantages. We describe Lua and LuaOrb in Section 3, and in Sec- 
tion 4 show how they can be used to solve some classic configuration problems. 
In the last section we draw some conclusions. 

2 Using an Interpreted Language 

We will adopt the following definition: a language will be said to be interpreted 
if it offers some mechanism for execution of chunks of code created dynamically; 
in other words, if the interpreter is directly accessible to the language. According 
to this definition, languages such as Lisp, Lua, Mumps and Tcl are interpreted, 
while Pascal, C, C-h+ and Java are not. 

Programs are frequently developed in one environment, and later installed 
and configured in a different target environment. Much of this configuration 
activity relates to setting program variables (such as IP addresses, local directo- 
ries, server names, etc.) to appropriate values. Such configuration may be done 
through the use of environment variables or simple text files such as the X sys- 
tem “resource” files. However, as programs become more complex, configuration 
possibilities increase, specially in interface-related issues. Many programs allow 
menus to be created or modified, and even new operations (macros) to be defined. 
To support such flexibility, configuration must many times be controlled by a full 
fledged programming language. This need fueled the development of extension 
languages such as Tcl [16] and Lua [8]. 

With the incorporation of an interpreter to the run-time environment, pro- 
gram configuration files can contain much more than a list of data and options. 
A configuration file can contain initialization routines which use all the expressive 
power available in a programming language (conditionals, loops, abstractions, 
etc). 

In the preceding paragraphs we have been using the term “configuration” in 
the sense of tailoring an application to specific needs and environments. However, 
this task is not so different in its goals from the configuration of distributed 
applications. Rather, these are different points in a continuum of complexity 
levels. Therefore, it makes sense to explore the use of a language such as Lua, 
that has been proving itself very useful for the extension and configuration of 
sequential applications, as a tool for the configuration of distributed applications. 

In the context of configuration of applications, the fact that interpreted lan- 
guages allow changes to take place with no need for re-compilation becomes spe- 
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cially interesting, as it allows distributed applications to be dynamically modified 
with no need for service interruption. The interactivity offered by an interpreted 
language also fits in well with configuration requirements for testing and pro- 
totyping: the programmer can use a console to test components and partial 
configurations directly, avoiding the need for test stubs. 

Two weaknesses generally associated to interpreted languages are efficiency 
and robustness. In fact, interpreted languages are normally much slower than 
their compiled counterparts (a factor of 10 is not uncommon). However, in 
component-based applications, components may be coded in compiled, efficient 
languages such as C, with the interpreted language acting only as a a flexible 
connection element. In this role, the performance penalties interpretation impose 
may be overruled by time spent on communication or input/output calls. Simi- 
larly, robustness in a language must be evaluated in the context of its use. Static 
verification is certainly an important ally in the development of large software 
projects. Yet conventional programming languages pay a price for static typ- 
ing, namely, the loss of polymorphism and flexibility. Besides, if on one hand 
static typing is not available, on the other hand interpreted languages usually 
rely on sturdy run-time error- checking mechanisms (for uninitialized variables, 
dangling references, etc) which may be very useful for program debugging. In 
the case of CORE A based applications, it is important to emphasize that each 
CORBA component will typically have been developed with a conventional, stat- 
ically typed language. The use of an interpreted language as a glue between these 
components may result in a run-time error if an attempt is made to invoke a non- 
existing method or to call a method using incorrect parameters. As explained 
in the next section, such situation will generate fallbacks in Lua, which are in 
some ways similar to exceptions, and may be appropriately handled for program 
ffexibility. 

3 LuaOrb 

LuaOrb is a binding between CORBA and the language Lua, an interpreted 
language developed at PUC-Rio [8,5]. Lua is an extension language, implemented 
as a library. With its API, it is very easy to call Lua functions from C code, as 
it is very easy to register C functions to be called from Lua code. 

The CORBA standard [15] provides communication facilities to applications 
in a distributed environment. All communication between CORBA objects is 
mediated by the Object Request Broker (ORB) [15,21]. A client can interact 
with the broker through stubs or through its Dynamic Invocation Interface. 

OMG IDL is a technology-independent syntax for describing object inter- 
faces. Typically, specifications written in IDL are compiled into client stubs 
(called simply stubs) and server stubs (called skeleton programs). The client 
program is directly linked to the stub. The server program must implement the 
methods declared in the skeleton. 

This approach, used in most current language bindings, such as C-H+, Java, 
and Smalltalk, requires clients to be recompiled each time a change in the server’s 
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interface takes place or each time a new type of object is to be used as a server. 
Servers must be recompiled after any modification to their interface or imple- 
mentation. 

The CORBA architecture offers two mechanisms which allow programs to 
circumvent this need for re-compilation. On the client side, the Dynamic Invo- 
cation Interface (DII) is a generic facility for invoking any operation with a run- 
time-defined parameter list. On the server side, the Dynamic Skeleton Interface 
(DSI) is an interface for writing object implementations that do not have com- 
pile time knowledge of the type of the object they are implementing [22]. DII, 
DSI, and other CORBA services, like the Trading and Naming Services [21], offer 
the basic mechanisms to support a dynamic distributed object environment. In 
dynamic environments, applications can find out and incorporate new compo- 
nents at run-time. Besides, components can be extended on the fly to incorporate 
new functionality, and applications can be adapted, also on the fly, to compo- 
nent changes. This level of flexibility is very important to some applications, 
such as desktops and operating systems [21], network management tools [2], and 
cooperative applications [12]. 

Because CORBA allows the discovery of the object type and methods at run 
time, it is possible to implement mechanisms of dynamic typing. The interface 
repository (IR) offers support for applications to browse object types, and the 
Naming and Trader services offered by CORBA can be used to address the 
problem of finding out new objects on the system. 

Using DII, the programmer has also access to more method call modes than 
when using static stubs. CORBA supports three types of calls: synchronous^ 
which stands for the traditional RPC semantics, oneway^ which allows the client 
to invoke a method and continue its execution without waiting for completion, 
and deferred synchronous^ which allows the client to continue its execution imme- 
diately after a method call but to later poll the server for a result. This last 
possibility is available only through the DII. 

However, using DII and DSI are not trivial tasks, and involve querying and 
constructing complex structures. Because C and C+-h are static typed languages, 
with no automatic memory management, a program must build a dynamic call 
step by step, with explicit calls to create the parameter list, to set each parameter 
type and value, and so on [13]. 

It is therefore clear that DII and DSI can be interesting in many cases, 
but their use represents a difficult task with the current existing support. The 
next sections present LuaOrb, a binding of Lua to CORBA that offers a more 
suitable support for developing open applications, and allows the management of 
change in CORBA based application. LuaOrb interacts with Lua only through 
the official Lua API, and its implementation required no changes to the language. 



3.1 LuaOrb’s Client Binding 

Lua is a dynamically typed language, wherein all type checking is done at run 
time. Variables and functions have no type declarations. Objects in Lua (also 
called tables) have no classes; each object can have any kind of methods and 
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instance variables. The mapping between Lua and CORE A has tried to respect 
this flexibility of Lua. In that way, it should be possible to use COREA objects 
in the same way as other Lua objects. 

Eecause COREA objects were to be accessed like any other Lua object, 
the generation of stubs was neither necessary nor interesting. Instead, COREA 
objects should be accessible from Lua with no need of previous declarations, and 
with dynamic typing. To achieve this goal, the binding was built upon DII. 

LuaOrb uses proxies to represent COREA objects in a Lua program. A proxy 
is a regular Lua object that uses fallbacks, a reflexive mechanism of Lua, to 
change its default behavior [7] . When a Lua program calls a method from a 
proxy, the fallback intercepts the call and redirects it to the LuaOrb binding. 
Then, LuaOrb dynamically maps parameters types from Lua to IDL, does the 
actual invocation, and maps any results back to Lua. The mapping of parameter 
types is done by trying to coerce Lua values into IDL types, and vice versa for 
result types. This mapping is done between two dynamic type descriptions: the 
actual types of Lua arguments, accessed through the Lua API, and the formal 
types of the method parameters, accessed through the Interface Repository. 

To illustrate the use of LuaOrb, we will use the following IDL interface: 

struct book { 
string author; 
string title; 

>; 

interface foo { 

boolean add_book(book abook) ; 

boolean test () ; 

long divdong x, long y) ; 

>; 

To create a proxy of a distributed object that implements the foo interface, 
we use the createproxy function: 

a_foo = createproxyC'f oo”) 

The createproxy function has an optional second argument, which is the name 
of a specific instance of the interface specified in the first parameter. When the 
second parameter is specified, createproxy will only succeed if there is a server 
object with the same name of this parameter. This function is basically a direct 
mapping to its equivalent function in the ORE API. 

After a proxy has been created, the services of the related object can be 
requested. For example, the methods of the foo interface can be called as follows: 

a_book = {author="Mr. X", title="New Book"} 
a_f 00 : add_book(a_book) 

X = 20 

if a_foo:test() then 
X = f oo2 :div(x,2) 
end 



100 Noemi Rodriguez, Roberto lerusalimschy 



The first line creates an object with two fields, author and title, initialized 
with the given strings. The second line calls the add.book method from object 
a_foo (the colon is the Lua operator for method calls). When a_book is used 
as argument in this call, LuaOrb automatically tries to convert it to the IDL 
structure book; because a_book has the correct fields with the correct types, the 
conversion succeeds (otherwise, LuaOrb would signal an error). The conversion 
works recursively, so a list of Lua tables can be automatically converted to an 
array of records, for instance. 

Because of its dynamic nature, the type conversions allow many changes in 
an IDL interface not to affect its uses in Lua, such as reordering and removing 
of structure fields, and changes between IDL types with similar representations 
in Lua, such as short and long, or array and sequence. The type system 
that emerges from these properties is one in which structural compatibility is 
enforced [19]. 

As mentioned previously, use of the DII allows a mode of invocation called 
deferred synchronous^ where a client triggers a method and later polls for com- 
pletion. From the server’s point of view, it is transparent whether this invocation 
mode is used or not. To specify a deferred synchronous call in Lua, the program- 
mer simply prefixes the method name with deferred; for instance, 

a_f 00 : def erred_test 0 

A deferred method call returns a handler that can be used later for retrieving 
the method result. 

Deferred synchronous calls are specially natural in event-driven program- 
ming. In this setting, it is interesting to be able to define a function to be called 
upon the completion of the method. This is supported in LuaOrb through the 
completion.event function, which takes as parameters a handler returned by 
a deferred synchronous method call and a function. Upon method completion, 
this function is called with the method results as parameters. 



3.2 LuaOrb’s Server Binding 

The server binding of LuaOrb allows Lua objects to be used as CORBA servers. 
For that, it uses CORBA’s Dynamic Skeleton Interface (DSI). The basic idea 
of the DSI is to implement all calls to a particular object (which we will call a 
dynamic server) by invocations to a single upcall routine, the Dynamic Imple- 
mentation Routine (DIR). This routine is responsible for unmarshalling the argu- 
ments and for dispatching the call to the appropriate code. The most frequent 
application of DSI has been to implement bridges between different ORBs [23]. 
In this context, the dynamic server acts as a proxy for an object in some other 
ORB. 

Similarly to the client binding, LuaOrb also uses proxies for the server bind- 
ing, but reverses the roles: while in the client side each CORBA object has a Lua 
object as its proxy, in the server side each Lua object that acts as a CORBA 
server is represented by a proxy, which is a C++ object. This object handles all 
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method requests for the original Lua object through the DSI upcall routine. For 
each call received, the upcall routine unmarshalls the arguments, converts them 
to appropriate Lua values, calls the corresponding Lua method, and converts 
back the results from Lua to CORE A. The conversions follow the same rules 
used by the client binding, but with opposite directions. 

To see a very simple example of a server written in Lua, let us consider the 
following IDL interface: 

interface listener { 

oneway void put (long i) ; 

} 

interface generator { 

void set_listener (listener r) ; 

} 

Our generator object generates numbers (random numbers, for instance), and 
for each number it generates it calls the put method from its listener. We can 
write and use a listener object in LuaOrb as follows: 

1 = { put = function (self, i) 
print (i) 
end 

> 

gen = createproxy( "generator ”) 
gen: set_listener (1) 

In the first assignment, we create a Lua object with a single method, called put. 
Then, we create a proxy for a generator, and set that Lua object as its listener. 
When LuaOrb detects that the formal parameter type is an IDL listener, while 
the actual argument is a Lua object, it automatically creates a proxy for this 
Lua object, enabling it to work as a CORBA server. 

LuaOrb also offers an IDL interface for remote server update: 

interface LuaDSIObject { 

readonly attribute Object ob j ; 

void Inst all Implement at ion (string opname, string luaCode) ; 

} 

interface ServerLua { 

LuaDSIObject Instance (CORBA :: InterfaceDef intf ) ; 

> 

To create a new instance at a dynamically extensible server, a client first invokes 
method Instance. The single parameter of this method is a reference to an 
interface definition in the interface repository. This reference is used by Instance 
to retrieve information about the attributes and methods of the new object. 
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The new object is returned as the attribute obj inside a new LuaDSIObject 
(the object in this attribute is, in fact, the proxy for the new Lua object). The 
manager client can then invoke the method Installimplementation to install 
or modify each of the object’s methods. 

To illustrate the use of this interface, we will change our last example, so that 
the listener is created remotely, in another machine. We assume that the user 
has already created bindings to the Interface Repository (IR) and to a remote 
ServerLua (Server): 

— creates a new object with type ^ listener^ 

1 = Server : InstancedR: lookupC'listener”) ) 

— creates a new method called ^put^ 

1 : Installimplementation ("put” , 

"function (self, i) print (i) end") 
gen = createproxy ("generator") 
gen : set_listener (1 . obj ) 

The presented sequence of commands can be interactively issued from a simple 
LuaOrb console, which thus becomes a powerful tool for interactive dynamic 
configuration. 

3.3 Access to the Interface Repository 

The interface repository (IR), defined as a component of the ORB, provides 
dynamic access to object interfaces. The IR is itself a CORE A object, and can 
thus be accessed through method invocations. In general, these methods can be 
used by any program, allowing the user, for instance, to browse through available 
interfaces. However, the IR is specially important for DII and DSI, the dynamic 
interfaces of CORBA. DII allows programs to invoke CORBA servers for which 
they have no precompiled stub. In order to build dynamic invocations, the pro- 
gram must possess information about available methods and their parameters; 
the interface repository provides this information. On the server side, DSI allows 
a server to handle requests for which it has no precompiled skeleton. Again, the 
correct signature of these requests must be obtained from the interface reposi- 
tory. 

LuaOrb provides a library, called LuaRep, to simplify access to the IR. 
LuaRep makes extensive use of Lua data description facilities. Through the 
use of fallbacks, LuaRep reflects the repository information into Lua objects, so 
that any operation on these objects is automatically converted to the equivalent 
operation on the repository. This allows the Lua programmer to manipulate the 
repository information by accessing regular Lua objects. The reflexivity imple- 
mented by LuaRep allows not only queries to the repository, but updates as 
well. 

The possibility of dynamically updating the IR extends the flexibility 
obtained with LuaOrb. It allows a manager client to install not only new imple- 
mentations for existing interfaces, but also unforeseen services in the server, by 
first adding their definitions to the interface repository. 
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4 Configuration of Applications 

In languages like Conic [10] and Darwin [11], the interface of each component 
is described in terms of input and output channels. In Darwin, for exam- 
ple, a component is described in terms of the communication objects it pro- 
vides (roughly comparable to input channels) and the communication objects it 
requires (roughly comparable to output channels). This approach allows the spec- 
ification of different paradigms of interaction between processes [1]. For instance, 
filtering structures can be easily built from components such as the one shown 
below: 

component filter { 

provide left <port,int>; 
require right <port,int>; 

> 



In the CORE A model [21], interfaces are described in IDL by method signa- 
tures, instead of input and output ports. Method signatures can declare input 
and output parameters, besides optional return values. Thus, a method signature 
describes a bidirectional flow of information. 

The two approaches work at different levels of abstraction. In Darwin, the 
interface of a component contains complete information about all the communi- 
cation in which the component is involved. Part of the communication activity 
of a component typically relates to offered services, while some of it may relate 
to a specific implementation. As an example, a component that implements a 
parser may or may not communicate with a file server, according to the way it 
stores temporary information. In CORBA, only offered services are declared in 
the interface of a component, and nothing is known about the methods which 
will in turn be called by this component for implementation purposes. On one 
hand, this gives the programmer much less control over the construction of a 
complete application. On the other hand, this mechanism supports the use of 
existing services as black boxes, about whose implementation the programmer 
needs not worry. Rather than coupling service requirements to service provi- 
sions, the building of an application in this case consists mainly of decisions 
about which services to use. 

In some cases, however, the need for objects that offer specific methods may 
be part of a CORBA object interface. In object oriented programming, an impor- 
tant concept is that of a listener or a eallbaek object. The specification of an 
object may define that it must pass on produced data to a consumer object 
or register the occurrence of certain events at a registrar object. In such cases, 
since communication is part of specification, and not implementation, the inter- 
face of the object must reflect the existence of these communication partners, 
for instance by providing methods for defining who the partner is. 

Recently, languages like Darwin and OLAN [3] are being called arehiteeture 
deseription languages. This reflects the emphasis on a top-down approach, where 
support is given for the specification of application structure, and tools are 
offered for the translation of this specification to module skeletons, which must 
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be filled in by the programmer. In component based programming, a bottom- 
up approach to software development may be more appropriate. The goal here 
would be, given the need for a new application, to compose it from existing 
components keeping new code to a minimum. Since the focus is on using exist- 
ing components, an important role of the configuration language is to act as an 
intermediate between components not originally designed to be used together. 

In our system, the description of the application is a script written in LuaOrb. 
The possibility of invoking arbitrary CORE A servers with no need for previous 
declarations allows the set of COREA components which compose the appli- 
cation to dynamically evolve. This evolution is implicit: There are no explicit 
linking or unlinking operations. 

Lua and the LuaOrb framework provide an environment that allows the 
application to evolve in yet another way. Through the ServerLua interface, a 
specific component can be modified or extended from a Lua program running 
on a different machine. This program may be part of the distributed application 
or may be executing as an independent activity. 

In the next subsections, we discuss some classes of change management in 
COREA based applications. Other configuration examples are presented in [18]. 



4.1 Interfaces and Application Configuration 

The goal of our first example is to discuss how the concepts of “provisions” 
and “requirements”, which are present in classic configuration languages, can 
be modeled using Lua and COREA. The example we use is the primes sieve of 
Erathostenes discussed for Darwin in [9] and [11]. In this application, a process 
feeds a stream of numbers to a pipeline of filter processes. Each filter prints the 
first number it receives and subsequently filters out multiples of that number 
from the stream of numbers. 

The interface definition of the filter component used in Regis is basically the 
one we presented previously. [11] describes two ways in which a pipeline can be 
created with these components. In a static configuration, a fixed number of filter 
components is instantiated, and each right port is connected to a left port of 
another filter through a bind command. In a dynamic configuration, each filter 
component is dynamically instantiated when its predecessor in the pipeline uses 
its left port. In what follows, we will discuss only the dynamic configuration, 
since our main focus here is on application reconfiguration. 

The IDL for the filter component in COREA, primeFilter, is as follows: The 
provider interface plays the part of an input channel. It gets new items from 
calls to its single method put. 

interface provider { 

oneway void put (long i) ; 

> 

The requirer interface models an output channel, that is, a component requir- 
ing an input channel. Its single method, setprovider, connects it to an input 
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channel, so that, for each new item the interface needs to output, it calls the 
put method of its provider. 

interface requirer { 

void setprovider (provider r) ; 

} 

Finally, the primeFilter interface inherits from these two other interfaces. 
Therefore, it acts as a filter, receiving numbers through its put method, and 
passing them to its provider. 

interface primeFilter: provider, requirer { 

} 

The specific behavior of this filter is quite simple: It stores the first num- 
ber it receives. After that, it filters out all numbers that are multiple of that 
first number; only non- multiple numbers pass through it. An implementation of 
primeFilter is sketched in Figure 1. 



class primeFilter { 
provider prov; 
long my prime; 

primeFilter () { myprime=0; } 

void setprovider (provider n) { prov=n; } 

void put (long c) { if (Imyprime) myprime = c; 

else if (c7omyprime) prov. put (c); 



} 



} 



Fig. 1. Implementation of primeFilter 



Figure 2 presents the configuration of the sieve program in LuaOrb. The 
program prints out primes from 2 to N. The Lua object assigned to variable 
Inf is responsible for the dynamic creation of new filter components. After the 
declaration of the Inf object, a generator object is created. This component 
simply creates a stream of numbers from 2 to N, to be fed to the filter pipeline. 

In this example, as in the next, we use an appServer object to create 
new instances of a component. The appServer object represents an application 
server^ a service that registers interface implementations and supports requests 
both for running implementations and for new instances of a given interface. 

When the generator invokes put for the first time, it will activate the Lua 
object Inf, which is its current provider object. Execution of Inf :put results in 
the creation of a new primeFilter object, which is then set as the provider for 
the generator. Subsequent invocations of put in the generator will invoke this 
new filter object. This new filter object has Inf as its provider component, so 
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— define listener for creation of new filters 
Inf = { put = function (self, p) 
print (p) 

local next = appServer : createCorbaC'primeFilter") 
next :put (p) 

next : setprovider (self ) 
last : setprovider (next) 
end 

} 

start application 

gen = appServer : createCorba( "generator") 
gen : setprovider (Inf ) 
gen: createStream(2 ,N) 



Fig. 2. Configuring the sieve of primes 



the same behavior will be repeated each time a primeFilter invokes put for the 
first time. 

This example also illustrates how LuaOrb allows Lua and CORBA objects 
to be manipulated homogeneously. Calls to the setprovider method defined in 
the requirer interface receive both kinds of objects as arguments in different 
moments. 

Note that in the proposed scheme, each filter component is aware only of the 
existence of a provider object, to which candidates to primes must be passed 
on. The Lua script creates these new filters, as it identifies new primes, in a way 
that is completely transparent to the C++ components. 

The paradigm used in this example is data driven. The generator is the 
main active object; filters act as passive objects, activated when a new datum 
is available. We could build this same example with a result-driven paradigm, 
wherein we change the roles of providers and requirers: Providers would have 
a single method get, which returns a new item, and requirers would call this 
method from their providers whenever they need a new item. The possibility 
of these two dual approaches illustrates the flexibility offered by CORBA and 
LuaOrb. 



4.2 Event-Driven Applications 

An important class of distributed applications are event-driven applications, 
where actions must be associated to the occurrence of specific events. This section 
discusses how CORBA-based event-driven applications can be coded in LuaOrb. 

To illustrate the proposed programming style, we will get an example used 
in [17] to present Glish, a language that supports communication through events. 
In this example, a simple distributed application is composed out of two com- 
ponents: one measures some data and the other displays the data. The monitor 
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component announces it has produced new data by generating an event. The 
response to this event is to activate the display component. Using Lua and 
CORBA, we can again resort to passing a listener object as an argument to the 
monitor; the generation of an event is then modeled by an invocation of this 
object. 

Possible IDL interfaces for these two components are as follows: 

interface monitor { 

void measure (long intervalsize , supervisor s) ; 

>; 

interface display { 

void newDataCdouble data) ; 

} 



As described in Section 3.1, LuaOrb provides deferred invocations, which 
allow an asynchronous style of programming. The Lua configuration program 
can invoke measure asynchronously; it can then proceed its execution and the 
Lua supervisor object will be called when needed. The configuration code for 
this example is as follows: 

— supervisor object 
s = { newData = function (nd) 

d: :newData(nd) 
end 

} 

d = appServer : createCorbaC "display") 
m = appServer : createCorbaC "monitor") 
m : def erred_measure (5 , s) 



In this style of component linking, all events pass through the configuration 
program. As discussed in [17], this increases the cost of communication, but also 
buys flexibility. Suppose that, depending on the value of the data, there is a 
need to transform it before passing it on to the display component. This can be 
trivially programmed in s: :newdata. According to the kind of transformation 
that is needed, it could either be done by calling a third component, or in the 
Lua program itself. 

When performance is vital, Glish provides point-to-point links between pro- 
grams, through an explicit link statement. Unlike Glish, LuaOrb does not need 
extra mechanisms for such links. In LuaOrb, this facility follows directly from 
the fact that Lua objects and GORBA objects are interchangeable. So, we can 
pass the display component directly as an argument to function measure: 

m = appServer : createCorbaC "monitor") 
d = appServer : createCorbaC "display") 
m : def erred_measure (5 , d) 
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4.3 Component Modification 

In the previous section we have discussed how new components can be dynam- 
ically added to an application and how the flow of data can be dynamically 
redirected between components. In this section we discuss changes to a compo- 
nent itself. 

For this discussion we do not use an example from the literature, since few 
works discuss runtime changes to components. Instead, we will use an example 
drawn from network management. Figure 3 shows IDL definitions for a com- 
ponent that provides TCP-related information. The definitions represent infor- 
mation provided by the SNMP standard management information base, MIB- 
II [14]. The tcpConnTable attribute contains, at any moment, a description of 
each current TCP connection. Each entry in this table is a tcpConnEntry struc- 
ture, which contains the local and remote ports and IP addresses, and the state 
of the connection. 



struct tcpConnEntry { 

long tcpConnState; 
string tcpConnLocalAddress ; 
long tcpConnLocalPort ; 
string tcpConnRemAddress ; 
long tcpConnRemPort ; 



typedef sequence<tcpConnEntry> tcpConnTable; 
interface tcpGroup { 

readonly attribute tcpConnTable connections; 
tcpConnTable FilterConnections () ; 



>; 



Fig. 3. IDL for network management component 



A common problem in network management is the availability of enormous 
quantities of raw data that must be processed by the management application. To 
circumvent this problem, the tcpGroup interface provides a FilterConnections 
method, which returns only the connection table entries that satisfy some crite- 
ria. 

Using LuaOrb, this interface can be implemented by a component that can be 
modified dynamically, allowing the administrator to establish different filtering 
criteria with no need for re-compilation. Eor instance, first we can create this 
component with a trivial filter, that returns the whole table of TCP connections: 

trivialf = "function () return connections end" 
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dsiobj = serverlua: Instance (IR: lookup C'tcpGroup") ) 

dsiobj : InstallImplementationC'FilterConnections" , trivialf ) 

Later, we can redefine this implementation, to select only connections that use 
local port 79, say: 

newf = [ [ 
function () 

filtertab = {} 
i = 0 

while connections [i] do 

if connections [i] . tcpConnLocalPort==79 then 
tinsert (filtertab , connections [i] ) 
end 

i = i+1 
end 

return filtertab 
end 

]] 

dsiobj : InstallImplementationC'FilterConnections" , newf) 

The newf variable holds a string with the code of the new filter function. (Lua 
uses the double square brackets to delimit literal strings that span several lines.) 
After installing this new implementation, we can use it to retrieve the table of 
connections: 

tcpG = dsiobj. obj 

interestingConnections = tcpG : FilterConnections () 



5 Final Remarks 

It seems natural here to relate the problem of component-based configuration 
with the more general problem of software configuration, which deals with cus- 
tomizing software to different needs and environments. 

The use of a textual configuration file for software customization reflects a 
declarative, static style, and can be compared to the use of a declarative architec- 
ture description language in component-based applications. Work in architecture 
description languages, based on a more declarative linking and unlinking style, 
has given much focus to the problem of checking configuration consistency [6,4] 
Parsers that check required properties can also be easily built for configuration 
files. 

This is in fact only another application of the properties of static checking. 
As is always the case, a static solution represents robustness, avoiding many run 
time errors, but limits the flexibility that can be provided. 

With the use of an extension language for software configuration, the con- 
figuration file becomes a program, with the associated difficulties for automatic 
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checkers, but also with the flexibility that only a programming language can pro- 
vide. Besides, execution errors due to eventual inconsistencies can many times 
be captured and handled in an interpreted language. 

In this work, we proposed extending the use of extension languages to the 
configuration of component-based applications. No explicit linking or unlinking 
operations are provided: On the one hand, this means no consistency verifica- 
tions are possible; on the other hand, it means that the evolution of the applica- 
tion is controlled with a full programming language. As shown in the examples 
discussed in this work, this results in a great deal of flexibility, allowing not 
only different patterns of component interaction to be dynamically defined but 
the components themselves to be easily modified. These facilities are specially 
important if component-based programming is to fulfill its promise in the fields 
of software reuse and rapid prototyping. 
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1 Introduction 

Compiler writers have always heavily relied on tools: parser generators for gener- 
ating parsers out of context free grammars, attribute grammar systems for gener- 
ating semantic analyzers out of attribute grammars, and systems for generating 
code generators out of descriptions of machine architectures. Since designing 
such special formalisms and constructing such tools deals with one of the most 
important issues in computer science, courses on compiler construction have 
always formed part of the core computer science curriculum. 

One of the aspects that make modern functional languages like Haskell [3] 
and ML [4] so attractive is their advanced type system. Polymorphism and type 
classes make it possible to express many concepts in the language itself, instead 
of having to resort to a special formalism, and generating programs out of this. 
It should come as no surprise that, with the increased expressibility provided 
by the new type systems, the question arises to what extent such tools may 
be replaced by so-called combinator libraries. In this paper we will present a 
combinator library that may be used to replace conventional parser generators. 

We will be the first to admit that many existing special purpose tools do 
a great job, and that our library falls short of performing in an equally satis- 
fying way. On the other hand there are good arguments to further pursue this 
combinator based route. We will come back to this after we have introduced 
conventional combinator based parsing, since at that point we will have some 
material to demonstrate the points we want to make. Let it suffice to say here 
that the size of our tool boxes is only a fraction of the code size of conventional 
tools; we will present a library of parsing combinators that takes just less than 
a 100 lines, whereas it provides many features that conventional tools lack. This 
paper contains enough room to include the full text of the library; something 
that can definitely not be said about almost any other existing tool. 

Parser combinators have been a joy for functional programmers to use, to 
study, and to create their own versions of [2,1,5]. In many courses on functional 
programming they form the culminating point, in which the teacher impresses 
his students with an unbelievably short description of how to write parsers in 
a functional language. We will do so here too. Next we will explain the advan- 
tages of programming with such combinators, and at the same time present 
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module BasicCombinators where 
infixl 3 <|> 
infixl 4 <*> 

5 type Parser s a = [s] -> E(a,[s])] 

pSucceed v input = [ (v , input)] 
pFail input = [ ] 

10 pSym : : Eq s => s -> Parser s s 

(<|>) :: Eq s => Parser s a -> Parser s a -> Parser s a 

(<*>) : : Eq s => Parser s (b -> a) -> Parser s b -> Parser s a 

pSym a (birest) = if a == b then [(b,rest)] else [] 

15 pSym a [] = [] 

(p < I > q) input = p input ++ q input 

(p <*> q) input = [ (pv qv, rest ) 

20 I (pv , qinput) <- p input 

, (qv , rest ) <- q qinput 

] 



Listing 1: BasicCombinatorsO 

some extensions. We will however also indicate some of the disadvantages of this 
extremely simplistic approach; thus in Section 7 we will show how to cure these 
disadvantages. Finally we will sketch how we have constructed a slightly more 
elaborate version of our combinators, which are usually considerably faster and 
are of production quality. 

2 Basic Combinator Parsing 

In Listing 1 we provide the full text of a library for building parsers. A 
Parser s a is a function (->) that takes a sequence of tokens as input ( [s] ) 
and returns a list ( [ ... ] ) , containing all possible ways in which a prefix of the 
input can be recognized and converted into a value of type a, each tupled with 
the corresponding remaining part of the input ((a, [s] )). A parser pSucceed v 
recognizes the empty string and returns the value v as the witness of this suc- 
cess, whereas the parser pFail always fails, and thus returns an empty list of 
solutions. The function pSym takes a single token as parameter and returns a 
parser that either recognizes just this token or fails. The function pSym is thus 
not a parser but a function that builds a parser. The parser combinator < | > 
constructs a new parser out of two alternative parsers, and thus corresponds 
to the symbol I in context free grammars; all it has to do is simply to apply 
both parsers to the input and to concatenate the two lists of found parses. The 
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module TreeParsingl where 
import BasicCombinatorsO 

data Tree = Leaf Char 
5 I Bin Tree Tree 

deriving Show 

pTreel = pSucceed Leaf 

<*> pDigitl 

10 <|> pSucceed (\ _ left right _ -> Bin left right) 

<*> pSym <*> pTreel <*> pTreel <*> pSym O' 

pDigitl = foldr (<|>) pFail (map pSym "0123456789") 

15 foldr op e (a:x) = a ^op^ foldr op e x 

foldr op e [] = e 



Listing 2: TreeParsingl 



combinator <*>, which takes care of the sequential composition, requires a bit 
more explanation. The construct used in its definition is called list comprehen- 
sion. First the parser p is applied to the input, and for all possible ways (<-) in 
which this parser p succeeds with value pv and remaining input qinput, parser 
q is applied to this remaining input qinput. This in its turn results in a list of 
(qv, rest) pairs. There is now some freedom in combining a result pv with its 
corresponding qv values. We have decided here to use function application: the 
parser p has to return a function that is applied to the result of the parser q. 
This fact is clearly expressed in the type of the combinator <*>. 

In Listing 2 we use the combinators for defining a parser pTreel for binary 
trees that have a digit at their leafs. By loading this module into the Haskell 
interpreter Hugs we may now call our first parser: 

TreeParsingl> pTreel "(2(34))" 

[(Bin (Leaf ^20 (Bin (Leaf ^30 (Leaf '40),"")] 

You can see that there is only one way of making the input into a Tree and 
that all input was consumed in doing so ( " " ) . 

You may not immediately see how the second alternative of pTreel 
works. For this we have to carefully inspect the fixity declaration of <*>, 
i.e. the text infixl 4 <*>; this defines <*> to be left-associative and 
thus something of the form f <*> a <*> b <*> c <*> d is interpreted as 
((((f <*> a) <*> b) <*> c) <*> d). In our case 

pSucceed (\e _ left right _ -> Bin left right) 

always returns a function that takes 4 arguments, and that is precisely the 
number of components recognized by the remaining part of this alternative. 
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3 Extending the Library 

We will now exploit an important property of our approach: the combinators are 
not operators in some special formalism, but instead are just functions defined 
in a general purpose programming language. This implies that we can write 
functions returning parsers and pass parsers as argument to other functions. This 
enables us to introduce a wealth of derived combinators, that take care of often 
occurring patterns in writing parsers. We have already seen a small example 
of such new possibilities when we defined the parser that recognizes a single 
digit. Instead of writing down a whole lot of parsers for the individual digits and 
explicitly combining these, we have taken the sequence of digits "0123456789", 
have converted each element of that sequence (map) into a sequence of parsers by 
applying pSym to it, and finally have combined all these parsers into a single one 
by applying f oldr (< I >) pFail. This function f oldr builds a result by starting 
off with the unit element pFail and then combining this with all elements in 
the list using the binary function (< I >) . Since this is an often occurring pattern 
the real functional programmer immediately sees an opportunity for abstraction 
here: 

pAnyOf : : Eq s => [s] -> Parser s s 
pAnyOf = foldr (<|>) pFail . map pSym 
pDigit2 = pAnyOf "0123456789" 

In Listing 3 we have given definitions for some more often occurring situa- 
tions, and our tree parser might, using these new combinators, also be defined as: 

pTree2 = Leaf <$> pDigit2 

<|> pParens (Bin <$> pTree2 <*> pTree2) 

This parser definition now has become almost isomorphic to the data type defini- 
tion. It should be clear from this example that there is now no limit to extending 
this library. 



4 Advantages of this Approach 

As a final example of what can be done we will now show how to construct 
parsers dynamically by writing a parser for an expression language with infix 
operators. An example input is: 

(L+R*) a+b* (c+d) 

and the code we want to generate is: 

abcd+*+ 

which is the reversed Polish notation of the input expressions. 

The text (L+R*) indicates that + is left (L) associative and has lower prior- 
ity than *, which is right (R) associative. In this way an unlimited number of 
operators may be specified, with relative priorities depending on their position 
in this list. 



116 



S. Doaitse Swierstra and Pablo R. Azero Alcocer 



module ExtendedCombinators where 
import BasicCombinators 
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s 


(b->b) 


-> 


Parser 


s 


b 


15 pAnyOf 


= 


foldr (<|>) 


pFail 




map pSym 













P 

f 



^ opt ^ V 

<$> p 

f <$ p 
p <* q 

20 p *> q 
p <**> q 

p <??> q 



= p < I > pSucceed 
= pSucceed f 
= const f 
= (\ X _ -> x) 

= (\ _ X -> x) 

= (\ X f -> f x) 



<*> p 
<$> p 

<$> p <*> q 
<$> p <*> q 
<$> p <*> q 



pFoldr alg@(op,e) p 

25 = pfm where pfm = (op <$> p <*> pfm) ^opt^ e 

pFoldrSep alg@(op,e) sep p 

= (op <$> p <*> pFoldr alg (sep *> p)) ^opt^ e 
pFoldrPref ixed alg@(op,e) c p = pFoldr alg (c *> p) 



30 pList p = pFoldr ((:), [] ) p 

pListSep s p = pFoldrSep ((:), [] ) s p 

pListPref ixed c p = pFoldrPref ixed ((:), [] ) c p 



pSome p = (:) <$> p <*> pList p 

35 pChainr op x = r where r = x <**> (flip <$> op <*> r ^opt^ id) 

pChainl op x = f <$> x <*> pList (flip <$> op <*> x) 

where 

f X [] = X 

f X (funcirest) = f (func x) rest 

40 

pPacked Irx = l*>x<* r 



— some 
pOParen 
45 pCParen 
pParens 



ad hoc extensions 
= pSym ^ ^ 

= pSym ^ ^ 

= pPacked pOParen pCParen 



Listing 3: ExtendedCombinators 
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We start by defining a function that parses a single character identifier and 
returns as its result that identifier in the form of a string: 

pVar = (\c -> [c] ) <$> pAnyOf [^aC PzC 

The next step is to define a function that, given the name of an operator, 
recognizes that operator and as a result returns a funetion that will eoneatenate 
the two arguments of that operator and postfix it with the name of the operator^ 
thus building the reversed Polish notation: 

pOp name = (\ left right -> left++right++ [name] ) <$ pSym name 

Note that, by using the operator <$ we indicate that we are not interested 
in the recognized operator; we already know what this is since it was passed as 
a parameter. 

Next we define the function compile. For this we introduce a new combinator 
<@>, that takes as its left hand side operand a parser constructor f and as its 
right hand side operand a parser g. The results v of parsing a prefix of the input 
with g, are used in calling f ; these calls, in their turn, result in new parsers which 
are applied to the rest of the input: 

(f <@> g) input = [ f V rest | (v, rest) <- g input ] 

Since our input consists of two parts, the priority declarations and the expres- 
sion itself , we postulate that the function compile reads: 

compile = pRoot <@> pPrios 

First we focus on the function pRoot, that should take as argument the result 
of recognizing the priorities. Here we will assume that this result is a function 
that, given how to parse an operand, parses an expression constructed out of 
operands and the defined operators: 

pRoot prios = let pExpr = prios (pVar < | > pParens pExpr) in pExpr 

There is a difference between an operator that occurs in the declaration part 
of the input and one in the expression part: the former may be any operator, 
whereas the latter can only be an operator that has been declared before. For 
the priority declaration part we thus introduce a new parser that recognizes any 
operator, and returns a parser that compiles the just recognized operator using 
the function pOp defined before: 

pAnyOp = pOp <$> pAnyOf — just some possible operators 

Now suppose we have recognized a left and a right associative operator result- 
ing in operator compilers pLeft and pRight. Out of these we can construct a 
function that, given the operand parser, parses infix expressions containing pLef t 
and pRight occurrences: 

pLR factor = (pChainl pLeft . pChainr pRight) factor. 

Generalizing this pattern to an unlimited number of operators we now deduce 
the definition: 

pPrios = pParens $ 

pFoldr ((.), id) (( pChainl <$ pSym 

<|> pChainr <$ pSym ^RO <*> pAnyOp) 
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Let us now compare once more this approach with the situation where we 
would have used a special parser generator. In the combinator approach we can 
freely introduce all kinds of abbreviations by defining new combinators in terms 
of existing ones; furthermore we may define higher order combinators that take 
arguments and return values that may be parsers. This is a property we get for 
free here, and is absent in most tools, where the syntax of the input is fixed and 
at most some form of macro processing is available as an abstraction mechanism. 

Another important consequence from embedding our parser construction in 
an existing language is that type checking and error reporting can directly be 
done at the program level, and not at the level of some generated program. 

5 Analysis 

Before going on let us reflect for a while on why this all works so neatly; somehow 
we managed to define a new language within an existing one. There are many 
important aspects to be distinguished here. 

In the first place having polymorphic types is essential. We managed to keep 
the types of the parser and the types of the result completely separated. There 
is no way in which the parsers can inspect those values or mutilate them. All 
they know is that they have types like a and a -> b. If these are the only 
things known, the only thing a parser can do is apply the one to the other, but 
that was the intention of providing these types. So the combinators really are a 
conservative extension of the rest of the program. 

In the second place the type classes make it possible to precisely define the 
interfaces between the parsing part and the rest of the program. For the parsing 
it is necessary to know whether tokens are equal or not, and precisely this fact 
is thus specified in the context part (Eq s =>) of the type of the combinators. 
This is the piece of program text needed, but also the only available property of 
tokens in the parsing part of your programs. 

The third issue that makes things look nice is of a more syntactic nature: 
by being able to define new infix operators, parser definitions can be made to 
resemble grammars, thus taking away another reason for having a special for- 
malism. 

Although the combinators defined before look very attractive, they have some 
serious shortcomings, that make them almost unusable in practice. 

Because we have used the list of successes approach, the result for incorrect 
inputs will be an empty list, and no indication what went wrong and where is 
given. Furthermore we rely on backtracking for finding all possible parses; our 
(extended) combinators were cleverly defined in such a way that they return the 
longest possible parse first, and this is usually what one wants. When the input 
contains errors or we are not primarily interested in the greedy solution, there 
is a high backtracking overhead to be paid. 

In the next section we will attack these two problems, and will come up with 
a set of basic combinators that not only report errors but also repair the input. 
We will sketch the implementation of an equivalent set of combinators that do 
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not even suffer from the backtracking overhead. Of course there is a price to 
be paid too: they are far more complicated and the number of lines needed for 
describing them is almost tenfold; these hundred lines however still compare 
favorably with the size of equivalent special purpose tools such as YACC which 
have far less expressive power. 



6 Error Locating Parsing Combinators 

Before going into techniques for error correction we first spend some time on 
investigating how to get some form of error reporting. Since the two aspects are 
deeply intertwined however we will only do so briefiy. 

In case the input is erroneous we will not be able to return a complete parse, 
but we may try to report on how far we got in the input. In order to do so we 
change the type of the parsers such that they not only return a value, but also 
how many tokens were accepted in the parsing process. To this end an extra 
argument is tupled with the input: the number of tokens accepted so far. The 
new combinators are given in Listing 4. 

The main disadvantage of this approach is that, once we have discovered 
where the erroneous point is, we have lost the computation that led us there; 
this implies that we have both lost the necessary contextual information for 
deciding what kind of repair to make, and the information to continue with the 
parsing process from that point on. 

This suggests that we do the error correction as soon as we discover an 
erroneous situation, because then we still have the contextual information at 
our disposal. This however is not trivial. We may locally discover that we got 
stuck, but maybe there is some other alternative that will bring us much further; 
in that case the current state can just be discarded and no further time should 
be wasted on it. In order to make this decision we have to convert our depth- 
first backtracking strategy into a breadth-first strategy, in which we work on 
all possible parses in parallel. Only then will we be able to discover whether 
correcting actions are worthwhile to be taken, or whether there are still other 
alternatives present that can make progress without having to perform such 
corrective actions. 

The basic step we take now is to look at the corrective actions and the decision 
whether they were needed or not separately: we generate a set of candidates 
containing all possible parses and all possible corrections, and decide elsewhere 
which candidate wins. This approach may look horrendously expensive, but we 
will exploit lazy evaluation to prevent the full computation of all these (possibly 
corrected) parses. This will also take care of another problem: once we start 
changing the input by adding or deleting symbols, the set of parses is likely to 
become infinite, and we had better avoid computing this whole collection! 
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type Parser s a = ( Integer, [s] ) -> [Result s a] 

data Result s a = Until Integer 

I Succeed a (Integer, [s] ) 

5 deriving Show 

pSucceed v (n, input) = [Succeed v (n, input)] 
pFail (n, input) = [Until n] 

10 pSym a (n, (birest)) = [if a == b then Succeed b (n+l,rest) else 

Until n] 

pSym a (n, [] ) = [ Until n] 

(p < I > q) ninput = p ninput ++ q ninput 

15 

(p <*> q) ninput = let presuit = p ninput 

in [ Until np | Until np <- presuit] 

++ 

[ Succeed (pv qv) nqrest 

20 I Succeed pv nprest <- presuit 

, Succeed qv nqrest <- q nprest 

] 

++ 

[ Until nq 

25 I Succeed pv nprest <- presuit 

, Until nq <- q nprest 

] 

parse p input = foldrl best (p (0, input)) 

30 where a ^best^ b = if pos a > pos b then a else b 

pos (Until n) = n 

pos (Succeed _ (n,_)) = n 



Listing 4: BasicCombinatorsl 



7 Error Correcting Parsing Combinators 

Since we have decided to deal with the error correction as an integrated part 
of the parsing process, we will start with a closer inspection of the kind of 
corrections we want to make. The point where we discover that no progress can 
be made is in the function pSym, where we expect to see a specific token at the 
head of the input stream, but unfortunately find something different. In this case 
there are basically two ways to make progress: 

— insert the expected token at this position 

— delete the unexpected token at this position and try again 

So a first attempt for the function pSym is something of the following form: 
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pSym a input® (b: rest) 

= if a == b then [(b, 
else [(a, input)] 

++ 

pSym a rest 
pSym a [] = [(a, [])] 



rest)] 

— pretend that the token was seen 

or 

— delete the unexpected token from the input 

— and try again 

— pretend the token was there 



On a close inspection one sees that this version of pSym actually constructs 
all possible inputs and and tries to match those to the given input. 

The question that arises now is which parse to select; the given approach 
generates a tremendous number of parses, most of them corresponding to heavily 
mutilated versions of the input. So in our next step in the development we 
combine each result with information about its quality. For this we introduce 
a new data type that represents a parsing history as a sequence of acceptance 
(Okstep) and correction (Failstep) steps. A first attempt of this approach is 
given in Listing 5. Instead of passing around an integer indicating how many 
steps were successfully taken in the past, each result is now tupled with how 
many steps were successfully taken in its recognition: so “counting” starts at the 
end of the recognized part, instead of at the beginning. For correct inputs the 
length of the list of steps will, once we return the value at the root, be the same 
as the integer tupled with the result in the previous attempt. 

We have not given here the function best yet, since this solution is erroneous 
anyway. A moment of thought will show that the final result of the parsing 
process is, since most languages are infinite, likely to be infinite too: the given 
input can, with a sufficient number of correcting actions, be changed into each 
of the sentences of the language. A further shortcoming of this approach is that 
after recognizing a p<*>q, each sequence of steps of p is prefixed to many q- steps. 
As a consequence many resulting sequences will have long common prefixes, 
which makes the comparison process prohibitively expensive. Also the blunt 
way of concatenating the steps is extremely costly, since building sequences by 
repeated concatenation of parts tends to be quadratic in the length of the list. 
It is tempting to first select the best element from the different q- solutions, and 
only append that solution to the corresponding p-solution, but that is wrong 
too: a short q-solution may lead to a better starting point for a parser that is 
invoked after p<*>q. Finally we are likely not to get a result at all, since before 
we are able to construct part of a result of the form psteps++qsteps we have 
to find an appropriate qsteps. As soon as we get in an infinite branch of the 
construction process no more solutions will become available! 



8 Continuation Based Parser Combinators 

A new insight has now popped up, however. If we could, after recognizing a 
p<*>q- structure, peek into the future to see what the consequences are of taking 
specific alternatives, we could report back to our caller about its future by com- 
bining our local information with that information about our future. Experienced 
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module BasicCombinators2 where 
infixl 3 <|> 
infixl 4 <*> 

5 data Step = Okstep 

I Failstep 
type Steps = [Step] 

failsalways = Failstep : failsalways 

10 

type Parser s a = [s] -> [(a, Steps, [s])] 

pSucceed v input = [(v , [] , input)] 

pFail input = [(undefined, failsalways, input)] 

15 

pSym a input® (b: rest) 

= if a == b 

then [(b, [Okstep] , rest)] 
else [(a, [Failstep] , input)] 

20 ++ 

[(v, Failstep: steps, r) | (v, steps, r) <- pSym a rest] 

pSym a [] = [(a, [Failstep], [])] 

25 (p < I > q) input = p input ++ q input 

(p <*> q) input = [ (pv qv, psteps++qsteps , rest) 

I (pv , psteps , qinput) <- p input 

, (qv , qsteps , rest ) <- q qinput 

30 ] 

parse p input = foldrl best (p input) 



Listing 5: BasicCombinators2 

functional programmers will smell the use of continuation based techniques here. 
So the question that now arises is: What should be our new Parser-type? 

Before answering this question let us look for a moment at the data struc- 
tures used in a conventional description of a top-down parser for context free 
languages: 

— The stack of the symbols recognized thus far, capturing the history of the 
parsing process and to be used in the construction of the final result. 

— The state of the parser, consisting of a stack of symbols still to be recognized. 

— The unconsumed part of the input. 

In a continuation passing style all such data has to be passed around on by 
means of parameters. So a parser that should recognize something of type a, 
takes the following arguments: 
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— a history^ that may be combined with the recognized value of type a into a 
new history of type b, that is to be passed on to 

— a future that will eventually construct something of type d, when passed 

— the remaining input 

So our parser should be of the following type: 

Future s b d -> History a b -> [s] -> Result d 

with appropriate definitions for Future, History and Result. The obvious 
choice for the history is 

type History a b = a -> b 

because this is the simplest type that can hold values that convert a’s into b’s. 
The type for the future does not leave us much choice either, since it has to 
accept the newly constructed history of type b and the remaining input of type 
[s] and should produce something tat contains a type d value: 

type Future s b d = b -> [s] -> Result d 

We might have taken the liberty to let the future return a value of type 
Result ^ s d, but that does not turn out to be necessary. 

Finally let us try to design the type Result d. A parser gets this value back 
from the future (i.e. the called continuation), and has to return it on to its past 
(i.e. its caller): the type Result in Listing 6, has been designed in such a way that 
it both represents the final result and the parsing steps in finding that result. 
The Cost field will, for the time being, be chosen to be 0 when an input token 
was successfully recognized, and 1 whenever a symbol was inserted or deleted. 

Note that we do not use a conventional continuation passing style, in which 
continuation calls are usually so-called tail-calls, in which the result of the con- 
tinuation call becomes the result of the calling function. Here we take the oppor- 
tunity to add some information to the result, before returning it to our own 
caller: i.e. we add information about the parsing steps that were taken between 
being called and calling our continuation. 

In Listing 6 we present the final solution, and we will go through this solution 
step by step. In lines 1-11 we repeat the types introduced thus far. The type 
Parser is a bit peculiar since it contains two type variables that do not occur as a 
parameter. In many extensions of Haskell98 however it is possible to denote such 
universally quantified types, provided we locate them inside a newtype definition; 
here such a newtype definition is for all practical purposes equivalent to a normal 
type definition, with the exception that it introduces an extra constructor (P). 

The function definition of pSucceed is straightforward: it combines its his- 
tory h with the witness of its success (v) into a new history (h v) , and passes 
this, together with the input on to its own future f . The function pFail simply 
returns an infinite list of fail steps. 

The sequential composition p<*>q starts the parsing of the composition of p 
and q by calling p. Since after p first q and then their common future f should be 
parsed, we construct the future for p by partially applying q to f . The history of 
the call to p should be such that when it is applied to pv its result is a function. 
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newtype Parser s a = P (forall b, d 

Future s b d 
-> History a b 
-> [s] 

5 -> Result d 

) 

type Future s b d = b -> [s] -> Result d 
type History a b = a -> b 
type Cost = Int 

10 data Result d = Step Cost (Result d) 

I Stop d 

— THE PARSER COMBINATORS 

pSucceed v = P (\ f h input -> f (h v) input ) 

pFail = P (\ f h input -> fails where fails = Step 1 fails) 

15 (P p) <*> (P q) = P (\ f h input -> p (q f) (h .) input ) 

(P p) <|> (P q) = P (\ f h input -> p f h input ^best^ q f h input ) 

pSym a = P ( 

\ f h -> let pr = \ input -> 

20 case input 

of (s:ss) -> 

if s == a then Step 0 (f (h s) ss) 
else Step 1 (pr ss) 

^best ^ 

25 Step 1 (f (h a) input) 

[] -> Step 1 (f (h a) input) 

in pr ) 

— SELECTING THE BEST RESULT 
30 best : : Result v -> Result v -> Result v 

left® (Step 1 aa) ^best^ right® (Step r bb) = if 1 < r then left 

else if 1 > r then right 
else Step 1 (aa ^best^ bb) 

(Stop V ) ^best^ _ = Stop V 

35 _ ^best^ (Stop v) = Stop V 



— delete 

— insert 

— insert 



Listing 6: BasicCombinatorsS 



that when applied to qv results in h (pv qv) , and we see that the function (h . ) 
does the job since h (pv qv) == (h . pv) qv == (h . ) pv qv. For p< | >q we 
simply call both alternatives with the same history and future and choose the 
best result of the two. 

The function pSym checks the first symbol of the input; if this is the sought 
symbol the continuation f is called with the new history (h s) and the rest of the 
input ss. Once this returns a result, the fact that this was a successful parsing 
step is recorded by applying Step 0 to the final result, and that value is returned 
to the caller. If the sought symbol is not present both possible corrections are 
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performed. Of course, when we have reached the end of the input, the only 
possible action is to try to insert the expected symbol. 

We have kept the most subtle point for desert, and that is the definition of the 
function best. Its arguments are two lists with information about future parsing 
steps, and ideally it should select the one containing the fewest corrections. Since 
computing this optimal solution implies computing all possible futures that at 
least extend to the end of the input, this will be very expensive. So we choose 
to approximate this with a greedy algorithm that selects that list that is better 
than its competitor at the earliest possible point (i.e. that list that comes first 
in a lexicographic ordering). We should be extremely careful however, since it 
may easily be the case that both lists start with an infinite number of failing 
(Step 1) steps, in which case it will take forever before we see a difference 
between the two. The function best has carefully been formulated such that, 
even when a final decision has not been taken yet, already part of the result is 
being returned! In this way we are able to do the comparison and the selection on 
an incremental basis. We just return the common prefix for as far as needed, but 
postpone the decision about what branch is to be preferred, and thus what value 
is to be returned, as long as possible. Since this partial result will most likely 
again be compared with other lists, most such lists will be discarded before 
the full comparison has been made and a decision has been taken! It is this 
lazy formulation of the function best that converts the underlying depth- first 
backtracking algorithm into one that works on all possible alternatives in parallel: 
all the calls to best, and the demand driven production of partial results, drive 
the overall computation. In the next section we will complete our discussion 
by describing how to call our parsers, and what functions to pass to our initial 
parsers. 

9 Further Details 

Having constructed a basic version of the parser combinators, there still is some 
opportunity for further polishing. In this section we will describe how error 
reporting may be added and how constructed parsers have to be called. We 
finish by pointing out some subtle points where the innocent user might be 
surprised. 



9.1 Error Reporting 

Despite the fact that we have managed to parse and correct erroneous inputs, 
we do not know yet, even though we get a result, whether any or what correc- 
tions were made to the input. To this end we now slightly extend the parser 
type once more as show in Listing 7. All parsing functions take one extra 
argument, representing the corrections made in constructing its corresponding 
history. When further corrections are made this fact is added to the current list 
of errors. Errors are represented as a value of type Errors s -> Errors s, so 
actually we are passing a function that may be used to construct the Errors s 
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newtype Parser s a 
= P (forall b, d. Future s b d 
-> History a b 
-> Errs s 

5 -> [s] 

-> Result s d 

) 

type Errs s = (Errors s -> Errors s) 

10 

data Errors s = Deleted s s (Errors s) 

I Inserted s s (Errors s) 

I InsertedBef oreEof s (Errors s) 

I DeletedBef oreEof s (Errors s) 

15 I NotUsed [s] 

instance Show s => Show (Errors s) where 



show 


(Deleted s w e 


) = msg 


"deleted " 


s 


(show w) 


e 


show 


(Inserted s w e 


) = msg 


"inserted " 


s 


(show w) 


e 


show 


(InsertedBef oreEof 


s e) = msg 


"inserted " 


s 


" (virtual) 


eof" e 


show 


(DeletedBef oreEof 


s e) = msg 


"deleted " 


s 


" (virtual) 


eof" e 


show 


(NotUsed [] 


) — II II 










show 


(NotUsed ss 


) = "\nsymbols starting with " 





++ show (head ss) 



25 ++ " were discarded " 



msg txt sym pos resterrors = "\n" ++ txt ++ show sym 

++ " before " ++ pos ++ show resterrors 



30 — the new version of pSym 
pSym a f h 

= P( let pr = 

= \ errs input 
-> case input 

35 of (s:ss) -> 

if s == a 

then Step 0 (f (h s) errs ss ) 
else Step 1 (pr (errs . (if null ss 

then DeletedBef oreEof s 

40 else Deleted s (head ss) 

) ) 



45 



in pr 



ss 

) 



Step 1 (f (h a) 
[] -> Step 1 (f (h a) 



(errs 

(errs 



^best ‘ 

•Inserted as ) input) 

. InsertedBef oreEof a) input) 



Listing 7: Error Reporting 
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that we are interested in. As a consequence all combinators change a bit, in the 
sense that they all get an extra argument that is just passed on to the called 
continuations. Only when corrections are applied in order to be able to continue 
we have to do something, and we have seen that this is local to the function 
pSym. The new version of pSym is given in Listing 7. 

The errors are returned in the form of a data type, that may be converted 
into a string using the function show. 



9.2 How to Stop? 



Before we can actually parse something we still have to decide what kind of con- 
tinuation to pass to the parser corresponding to the root symbol of the grammar. 
We could have defined this function once and for all, but we have decided to pro- 
vide some extra flexibility here. In our expression example we have already seen 
that one may not only want to stop parsing at the end of the input. The func- 
tion parse takes a Boolean function that indicates whether parsing has reached 
a point that may be interpreted as the end of the input. If this is the case then no 
error message is generated. Otherwise it is reported that there were unconsumed 
tokens (which are assumed to have been deleted) . Furthermore not only the wit- 
ness of the parsing is stored in the resulting value, but also the accumulated 
errors and the remaining part of the input. 



parse (P p) user_eof input 



= let eof 



V e input 
if user_eof 
then 

else foldr 



stepsresult 

stepsresult 

stepsresult 



input 

(Stop (v, input, 
(\ _ t -> Step 1 
(Stop (v, input, 
input 
(Step _ s) 

(Stop v) 

( p — the parser 
eof — its future 



e (NotUsed 
t) 

e (NotUsed 



stepsresult s 

V 



id 

id 

input ) 



its history 
no errors thus far 



[] ))) 

input))) 



10 Pitfalls 

One of the major shortcomings of programming in the way we do, i.e. with many 
higher order functions and lazy -and possibly infinite - data structures is that a 
feeling for what is actually going on and how costly things are, easily gets lost. 

The first example of such a pitfall occurs when in the input a simple binary 
operator is missing between two operands. Usually there are many operators 
that might be inserted here, all leading to a correct context free sentence. Unfor- 
tunately the system is, without any further help, not able to distinguish between 
all these possible repairs. As a consequence it will parse the rest of the program 
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once for each possibility, frantically trying to discover a difference that will never 
show up. If several of such repairs occur the overall parsing time explodes. It is 
for this reason that we have included the cost of a repair step in an Int-value. If 
this phenomenon shows up, all operators but one can be given a higher insertion 
cost, and as a consequence one continuation is immediately selected without 
wasting time on the others. Furthermore, in the complete version of our library^ 
the lookahead for the best function is limited somewhat, once it has been discov- 
ered that two sequences that both contain a repair step are being compared. It 
is clear that for the function best there is still a lot of experimentation possible. 

A second problem arises if we have a grammar that may need a long look- 
ahead in order to decide between different alternatives. An example of such 
a grammar is: 
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n _ = 
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Here some heavy backtracking will take place, once we try to parse an input 
that starts with many a’s. Of course this problem is easily solved here by rewrit- 
ing the grammar a bit using left-factorization, but one may not always be willing 
to do so, especially not when semantic functions like count are different. For- 
tunately it is possible to exchange time for space as we will show in the next 
section, and this will be done by the combinators from our library. 

11 Speeding Up 

Despite its elegance, the process of deciding which alternative to take is rather 
expensive. At every choice point all possible choices are explored and usually 
all but one are immediately discarded by the calls to best. In most parsers 
the decision what to do with the next input symbol is a simple table lookup, 
where the table represents the state the parser is in. Using the technique of 
tupling we will sketch how in our parsers we may get a similar performance; 
since the precise construction process is quite complicated, time and space forbid 
a detailed description here. Furthermore we assume that the reader is familiar 
with the construction of LR(0) items, as described in every book on compiler 
construction. 

The basic data structure, around which we center our efforts, is the data type 
Choices in Listing 8. This data structure describes a parser, and is tupled with 
its corresponding real parser using the function mkParser; this is an extension 
of the techniques described for LL(1) grammars in [5]. 

The four alternatives of Choices represent the following four cases: 

Found In this case there is no need to inspect any further symbols in the input; 

the parser P s a should be applied at this point in the input. 
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data Choices s a = Choose 
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s a) 


[(s, Choices s a)] 
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mkparser cs 
=let choices 

= cata_Choices 

(\ (P p) css — shift 

-> \inp -> case inp 



A 

A 

A 



of [] -> 

(s:ss) -> 



(P p) css 


-> \inp -> p 


(P p) 


-> \_ 


-> p 


(P p) cs 


-> \_ 


-> p 



P 

case find cmp css s of 
Just (_, cp) -> (cp ss) 
Nothing -> p 

bestp^ (css inp) — reduce and 
shift 

— reduce 

— only 
candidate 



) cs 

in (cs, (P (\ f h e input -> (choices input) f h e input))) — tuple 



p ^bestp^ q = ( \ f h e input -> p f h e input ^best^ q f h e input ) 



Listing 8: MakeParser 



Choose Based on the look-ahead inspected thus far it is not yet possible to 
decide which parser to call, so we have to use the [(s. Choices s a)] 
structure to continue the selection process. If the next input symbol however 
is not a key in this table the corresponding P s a is the error correcting 
parser that applies here. This state corresponds to a pure shift state, in 
LR(0) terminology. 

End This corresponds to a pure reduce state. The parser P s a is the parser we 
have to call, and we can be sure it will succeed since in the selection process 
we have seen all the symbols of its right-hand side. 
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Split This corresponds to a shift-reduce state and we have two possibilities. So 
we continue with the selection process in the Choices s a component, and 
apply both the parser found there and the P s a component of the Split, 
and compare the results using the function best. 

The function cat a_Cho ices is a homomorphism over the initial data type 
Choices: it returns a function that replaces each data constructor occurring in its 
argument with the corresponding function from the argument of cat a_Cho ices. 
The function mkparser is defined that tuples a Choices s a structure cs with 
a real parser. This demonstrates another important technique that can often be 
applied when writing functions that can be seen as an interpreter: partial evalu- 
ation. In our case the “program” corresponds to the choice structure, and the 
input of the program to the input of the parser. The important step here is the 
call to cat a_Cho ices that maps the choice structure to a function that chooses 
which parser to call. This resulting function is then used in the actual parser to 
select the parser that applies at this position (choices input), which parser is 
then called: (\ f h e input -> (choices input) f h e input). 

12 Conclusions and Further Reading 

We have shown how, by making use of polymorphism, type classes, higher order 
functions and lazy evaluation we can write small libraries for constructing effi- 
cient parsers. In defining parsers with this library all features of a complete 
programming language are available. 

Essential for the description of such libraries is the availability of an advanced 
type system. In our case we needed the possibility to incorporate universally 
quantified types in data structures. 

We expect that, with more advanced type systems becoming available, special 
purpose tools will gradually be replaced by combinator libraries. 
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Abstract. IBM SanFrancisco*^ provides application developers with 
a base set of object-oriented infrastructure and application logic compo- 
nents. These components are delivered as a set of frameworks, currently 
covering the domains of general ledger, warehouse management, order 
management, accounts receivable, and accounts payable. After a brief 
overview of the IBM SanFrancisco frameworks, its structure and con- 
tents, we will discuss the requirements for new application development 
tools, and practical research done to show the effectiveness of such tools. 



1 Introduction 

Components are quickly becoming mainstream in business application develop- 
ment. The internet revolution has forced application development organizations 
to find more effective ways of development and the use of Components is one 
way to provide better, more robust applications faster. 

The use of components does not make application development easier. The 
real development effort of using components includes the learning curve of 
becoming familiar with the functionality of the component and it programming 
interface. This learning curve can be very steep and should be taken into account 
when an organization adopts component technology. Another factor easily over- 
looked is that state-of-the art component frameworks will use Object Technol- 
ogy and will be built in the Java** programming language. Most traditional 
development organizations need time, typically a year, to adapt to these new 
programming paradigms. 

Appropriate tools will help with application development using components. 
Tools such as Rational Rose**, are commonly used during analysis and design of 
object-oriented applications, and IBM’s VisualAge* for Java is a prominent inte- 
grated development environment (IDE) in this area. These tools are adapting to 
component frameworks, but are still lacking features. This article surveys which 
features are needed for effective tools to be used with component frameworks. 

^ Trademark acknowledgements can be found at the end of this article. They are 
indicated with * and ** in the text. 
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We do so by looking at the architecture of one of such frameworks, IBM SanFran- 
cisco, and then derive requirements to support this architecture in application 
development. Next, we will look at an example of new tool architecture, devel- 
oped by the author, which is capable to support these requirements in a flexible 
way. 

The IBM SanFrancisco project is a very large component building effort. It is 
unique because it includes not only middleware, but also a large base of business 
components. We will give an overview of SanFrancisco in the next section. We 
start with an overview, more information can be found at [4]. 

2 Overview of SanFrancisco 

The IBM SanFrancisco shareable frameworks is a distributed object infrastruc- 
ture and a set of application business components. SanFrancisco delivers three 
layers of reusable code written in Java, for use by application developer. They 
are: 



— Foundation layer 

— Common Business Object Layer 

— Core Business Process layer 



Application 

Software 




IBM 

San Francisco 
Business 
Process 
Components 



Common Business Objects 



Foundation 



Java Virtual Machine 



Platforms 
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Development 

Tools 
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NT 
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Solaris HP/UX MVS ^ 




Clients: 


Java 


Browsers 


Active-X Lotus ^ 



Fig. 1. The structure of SanFrancisco 



Fig. 1 shows the layered architecture of SanFrancisco. In addition to the three 
layers just mentioned, you see Applications, which can be built on top of any of 
these layers. From an application development point of view, the benefits of using 
SanFrancisco will depend on which layer is used as a base for the application. 
The benefits will be highest when the Common Business Processes are used as 
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a base. On this level, typically 40-60 % of the functionality of an application 
is already provided by the framework. It allows the application developer to 
concentrate on differentiators, not on basic function. In the authors experience, 
many developers think at first that the SanFrancisco model is too complicated 
for their purpose. Gradually during development they usually start to appreciate 
the flexibility of the framework. 

From a technology perspective, applications based on shared frameworks, 
object-oriented technology and Java, provide the best opportunity for interop- 
erability across multiple applications and platforms. Eventually, the customers 
will be able to purchase applications which are more flexible, allow choice of 
platforms and allow interoperability with a choice of applications. 



2.1 Foundation Layer 

The Foundation layer provides the infrastructure and services that are required 
to build applications in a distributed, multi-platform environment. Its main ser- 
vices are: Utilities, for example: Installation, configuration, and administration; 
and Kernel serviees, for example: Transaction, Persistence, and Security. 

The foundation layer also provides a basic object structure, from which 
all other components in the framework inherit. The major object model base 
classes are: Entity and Dependent. Entities are persistent, transactable and 
distributable. All major components inherit from Entity. Dependents do not 
have an independent life-cycle. They are more light-weight and are only persis- 
tent when owned by an Entity instance. 



2.2 Common Business Objects 

The Common Business Objects layer provides implementations of commonly 
used business objects that are common to more than one domain. The Common 
Business Objects can also be used as a base for interoperability between appli- 
cations. We enumerate the currently existing common business objects here, to 
show the breadth of functionality provided: 

— General-purpose business ohjeets: Company, Business Partner, Address, Ini- 
tials, Classification, Number series 

— Common business proeesses: Cached balance. Life Cycle, Project, Dynamic 
Identifier, Key 

— Culture- dependent business objeets: Currency, Validation results 

— Commonly used finaneial business objeets: Bank accounts. Payment terms. 
Payment method. Settlement terms. Calendars (natural, fiscal, periodized). 
Financial batch. Financial Integration 

The names of the Common Business Objects mostly speak for themselves. 
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2.3 Core Business Processes 

The Core Business Processes layer provides business objects and default busi- 
ness logic for selected vertical domains. SanFrancisco delivers currently business 
components in the domains of accounts receivable, accounts payable, general 
ledger, order management (sales and purchase), and warehouse management. 

Each domain process can be used independently or in conjunction with other 
processes. The currently implemented domains include: 

— Financials: Ledger, Credit control. Item entry. Payment handling. Budget- 
ing; 

— Warehouse Management: Warehouse control. Stock replenishment. Picking 
Stock; 

— Order Management: Quotations, Sales orders. Cash sales. 

3 Architectural Aspects of SanFrancisco 

Software architecture is the specification of the structure of a system. This can 
include: system organization, component structure, protocols for communication, 
data access, composition of design elements, etc. See [5]. We split the description 
of architectural aspects into two broad areas. System organization and Compo- 
nent Structure. 



3.1 System Organization 

The interesting aspects of the SanFrancisco system organization are: the Java 
environment on which it is built, its layered architecture, and its distribution 
infrastructure. 



3.1.1 The Java Environment. The architecture of SanFrancisco is deeply 
rooted in the Java environment. By now, the advantages of Java are widely 
known. Java is a full-fledged Object Oriented language, accepted by the indus- 
try as the major application development language. It features portability, dis- 
tributed objects, integration with databases, among others. The investment in 
the IT industry to build tools and applications for the Java environment is high. 



3.1.2 Layered Architecture. The SanFrancisco components, which are 
developed as a pure Java framework, can exploit the advantages of the Java 
environment directly. We saw in Fig. 1, that the lowest layer of the SanFrancisco 
architecture consist of the Java virtual machine. The other layers of the San- 
Francisco system architecture build on the Java layer, by providing successively: 
infrastructure functionality in the Foundation layer, common domain function- 
ality in the Common Business Objects layer and specific domain functionality 
in the Business Process Layer. 



136 Ghica van Emde Boas 



3.1.3 Distribution. SanFrancisco has implemented a distribution infrastruc- 
ture. The distribution facilities allow application developers to obtain Platform 
independence and Persistence for their application components. SanFrancisco 
provides the clients which make use of these application components with func- 
tions to create instances of distributed objects, refer to an existing persistent 
object, create new objects, and to invoke methods on distributed objects. Fur- 
ther, objects are uniquely identified in the network, and locking and commitment 
control or taken care of by SanFrancisco. 

3.2 Component Structure 

SanFrancisco is organized as an object-oriented Component Framework. The 
framework consists of a set of Business Components which interact with each 
other. 

A component framework is not the same as a class library, which is familiar to 
most object developers. Class libraries are static structures and do not embody 
any kind of application flow. Business Process Components are dynamic, they 
make use of an architecture, the represent both design and implementation. 
Essentially, Business Process Components are like miniature applications, which 
can be customized and extended. 

3.2.1 Model Driven Development. An important point to note here is, 
that the complete SanFrancisco framework is documented as an UML** model. 
New components to be used in application development will be added to the 
already existing SanFrancisco model during analysis. The implementation will 
use the model and implement the components as specified in the model using 
the rules in the SanFrancisco programming model. 

What is the development process which is associated with the model driven 
development for SanFrancisco? A development process guides you through the 
process of building solutions on top of SanFrancisco business application com- 
ponents. It lists activities that you can do to follow and it indicates a logical 
sequence between some of these activities. The process used for SanFrancisco 
application development is not very different from standard object technology 
oriented development processes. A main point to note is, that it emphasizes 
mapping activities to SanFrancisco business components as part of the process. 

If we restrict ourselves to the activities of application development itself 
(forgetting about assessments, project management, quality assurance, etc), we 
see the following activities: 

— requirements collection, 

— requirements mapping, 

— analysis, 

— design, 

— coding and unit test. 

After each of the activities, mapping to SanFrancisco components will occur. 
This can cause re-iteration through previous phases, to find the best fit to the 
framework. 
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3.2.2 Design Patterns. The SanFrancisco Framework makes extensive use of 
design patterns in order to provide an easy way to understand common object 
models and their respective solutions. Published design patterns are used [3] 
and new ones were created. For example, in the Foundation Layer, the Factory 
pattern is used to create business objects. The Property Container pattern allows 
applications to add attributes to an object at run-time of the application, without 
changing the basic structure of the object’s class. 



3.2.3 Framework Extensions. Application developers need a common way 
to extend Frameworks in order to preserve the design integrity of the original 
framework. In addition, common extension mechanisms provide: 

— Consistency. 

— Interoperability of applications from different vendors. 

— Isolate framework changes made by application developers, to a limited num- 
ber of classes. This makes maintenance easier and allows upward compati- 
bility for new releases of the framework. 

SanFrancisco provides extension points that identify where and how to extend 
the framework. 

3.2.4 San Francisco Programming Models. One of the unique features 
of SanFrancisco is the ability to run in a network of interconnected servers, 
where the placement of data in containers is transparent to the application 
which accesses it. The San Francisco Programming Models supports consistent 
programming interfaces for using and developing business objects. This program- 
ming model prescribes both coding of the Business Components which reside on 
the server, and the programming conventions for a client which accesses these 
components. 

3.2.5 SanFrancisco and Enterprise JavaBeans. The Enterprise Java- 
Beans** is a new component architecture [6] for scalable, transactional, and 
multi-user secure Java applications. The next version of SanFrancisco will be 
based on this important architecture. It will allow developers to use other EJB 
compliant components together with SanErancisco components. Customers will 
have a choice of EJB-server which can be used to deploy SanErancisco applica- 
tions, instead of the proprietary SanErancisco Eoundation Layer. 

4 Traditional Ways to Develop SanFrancisco Applications 

Although the tradition is not very long, only a few years, it has become common 
practice in object-oriented application development to use an analysis/design 
tool such as Rational Rose or Select**, to generate code skeletons from the 
model specifications, and to use an IDE such as IBM’s VisualAge for Java to 
complete the code and add a user interface. 
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With the SanFrancisco product a code generator is provided which generates 
95 % of the server code required for a SanFrancisco application. Business logic 
has to be added to the code. In the latest release of SanFrancisco (vl.4), a bean 
wizard can generate SanFrancisco Beans to access the server objects, and Java 
Swing panels, which can be developed further, using Visual Age for Java. To 
produce well-performing, well-designed applications further coding and tuning 
has to be done by the developer. 

When code skeletons are re-generated from the model, it is a problem to 
re-insert the business logic and other changes. They have to be done by a cut- 
and-paste process. A code generator which is able to preserve code is planned. 
For the bean wizard, there is no such plan yet. 

4.1 A Wish List for SanFrancisco Development Tools 

In the previous sections we described why application development, technology, 
and customers alike, are interested in using framework technology such as San- 
Francisco provides. Despite the advantages a framework can offer, it has been 
found that in practice application development is more difficult than expected. 
There are two major reasons for this: 

1. High learning curve for object technology and framework content 

2. Tools are not well adapted to be used with frameworks yet. 

Most major Modeling tool and IDE (Integrated Development Environment) ven- 
dors introduced new versions of their tools to accommodate the Java program- 
ming language and UML, the Universal Modeling Language. To adapt to large 
frameworks such as SanFrancisco, more is needed. 

— Some modeling and development tools do not scale very towards 2000+ 
components (the number of components SanFrancisco provides). 

— The same is true for code generation technology for such a large programming 
model as SanFrancisco defines. Many code generators provided by modeling 
tool vendors are not easily customizable. Therefore, it is often impossible or 
very difficult to adapt code generators to specific framework or user require- 
ments. 

— The gap between the modeling and programming tools is still wide. Many 
code generators require cutting and pasting after re-generation of code 
because of a design change. 

— The tools are mostly not written in Java yet. They often run on a limited 
set of platforms, and have fixed functionality. They cannot integrate with 
the target environment. 

— Support for components as black-box building blocks is lacking in most tools 
and in UML itself. 

Because of these problems, good tools targeted at the SanFrancisco framework 
were slow to appear and the ones which are there now, still lack functionality. 

It seems that there is room for a new tool architecture. This architecture 
should implement the following wish list: 
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— Able to run on all Java platforms. 

— Full UML support. 

— Full Java support. 

— Supports components. 

— Exploit new document facilities, and use them to store model information, 
program source code and design documentation, (think of XML and XMI to 
use here). 

— Customizable to user requirements. 

— Flexible towards technology changes.. 

— One integrated, but configurable tool, combining modeling and code devel- 
opment facilities. 

— Generate code for both server and client code (for a variety of clients, both 
fat and thin). 

— The code generator would be customizable by the developer. 

— Changes would be preserved after code generation. 

— The tool would be small, but scalable. 

— The tool would integrate with SanFrancisco, allowing you to perform utility 
functions etc. 

Do we need to create a monster to fulfill the requirements sketched here? We do 
not think so. An often overlooked feature of the Java language environment is 
that Java classes are linked dynamically into the runtime environment. No other 
third generation language has this feature, including C+H- or Smalltalk. This 
allows a tool developer to develop a set of building block components for each 
target environment and package them according to the needs of a user. Massive 
recompilation, for which C+-F is famous, or, the difficulty to accommodate con- 
flicting requirements within one image, for which Smalltalk is known, are never 
a problem with Java. 

Other opportunities in Java to make programs which are flexible and cus- 
tomizable, are its JavaBean facilities, which are the Java implementation of com- 
ponents, and resource bundles, which can store in a file translations for program 
items in various languages, or configuration directives. 

The remainder of this article describes Business Component Prototyper. BC- 
Prototyper fulfills some of the requirements described above. 

5 Business Component Prototyper 

Business Component (BC-) Prototyper is an experiment in tool architecture, in 
an attempt to research some of the problems sketched above. It shows a proof 
of concept by providing an amazingly broad, but not very deep or polished, 
functionality. We do not pretend that BC-Prototyper has an answer for all items 
on the wish list, we just claim that our architecture would enable a solution to 
most requirements on the wish list. We also do not think BC-Prototyper can 
compete with any of the “big” tools, we just want to make clear that these tools 
could benefit from the ideas we provide. 
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Our proof that the BC-Prototyper architecture is effective, is mainly a hand- 
waving one. Showing its functionality, many people find it hard to believe that 
this tool was basically developed by just one person in spare time. Strict building 
block architecture, heavy use of code generation, and heavy use of the tool to 
build itself, have allowed to achieve these results. 

Currently, Business Component Prototyper is an evaluation tool, provided 
with SanFrancisco vl.4. Its objective is to help new SanFrancisco developers to 
understand the SanFrancisco foundation layer programming model more easily, 
by experimenting with creating some prototypes, which actually can run. BC- 
Prototyper is intended for evaluation, education, and simple prototyping use. 
For the development of production applications, other modeling tools should be 
used. The version of BC-Prototyper described here has more functionality than 
the version provided with SanFrancisco. This functionality is experimental in 
nature. 



5.1 Overview 

It will be helpful to describe some of the functionality of BC-Prototyper first. 

The BC-Prototyper is a pure Java tool that can run on any client platform 
supported by SanFrancisco. It is a simple modeling tool, featuring UML, and it 
has some integrated development environment (IDE) capabilities, such as editing 
or compiling Java code. Additionally, the tool has code generation facilities, 
and interfaces to many SanFrancisco utility functions. A major feature of BC- 
Prototyper is, that it can produce running applications, including a Graphical 
User Interface (GUI). 

Very briefly, developing a SanFrancisco application using BC-Prototyper 
would look as follows: 

— Develop a static object model. 

— Generate and compile Java code. 

— Generate proxies and add names of classes to the SanFrancisco naming ser- 
vice. 

— Start and run the application. 

— Iterate, to add business logic, or to customize the GUI. 



5.2 BC-Prototyper Building Blocks 

When the tool is started, a window will appear with three parts (see Fig. 2): 

1. A menu bar and tool bar. The toolbar buttons show the main actions a user 
can perform in the SanFrancisco configuration of the tool, for example: 

— Create, open, or save a project, edit project properties. 

— Create new class or relationship. Import a component, edit any of these 
model items. 

— Perform a build for selected components (generate server, client and GUI 
code, compile, generate proxies). 
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Fig. 2. BC-Prototyper window, the modeling tab-panel 



— SanFrancisco utilities: Start/Stop SanFrancisco, Append names to the 
SanFrancisco name space, Start/Stop the application. Show help infor- 
mation. 

2. A workarea, with one or more tab-panels. 

— The first tab-panel has the modeling functionality. The first panel dis- 
plays the currently loaded model in tree form and in UML-diagram form. 

— The second tab-panel provides Java development functions, and access 
to some of the SanFrancisco utility features. It shows a list of .java files in 
the package directory of the current project, a list of .java files for which 
proxies can be generated, buttons to compile selected .java files, show 
them in a simple editor, or generate a proxy. The generated file which 
contains the SanFrancisco naming information for the project, can also 
be viewed and updated in this panel. 

3. A message area, where the actions the tool performs are logged, or where 
errors are displayed. 

In Fig. 2, the modeling tab-panel is visible. It provides all usual modeling func- 
tions to create or edit a static UML model. In addition to what most modeling 
tools provide, such as creating a class, an attribute with scope, type etc., in 
BC-Prototyper you can specify how a class or attribute should be presented in 
the GUI. 
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Fig. 3. The development tab-pane in BC-Prototyper 



You can also specify specific SanFrancisco characteristics. For example, you 
can say that a class should be a SanFrancisco Entity or Dependent, or that 
a relation between two classes should be implemented as an EntityOwningExtent 
collection. For an attribute you can specify that in the GUI its value should be 
seen as a TextField or Text Area etc. 

BC-Prototyper provides specific support for Components. The components 
are shown as green class shapes on the drawing area. In Fig. 2, Describable- 
DynamicEntity and Address are components, taken from the SanFrancisco 
framework. From a modeling point of view, components are just like classes 
which are predefined and put in a class library. Whether the class actually con- 
sist of one or 100 real classes is not important when you try to use one. What 
is important, is to know its interface and functionality. Unfortunately, the San- 
Francisco components are not specified that way in the Rose** models provided 
with the product. You would see the complete static structure of a component 
when using Rational Rose. For BC-Prototyper, we have chosen to re-engineer the 
component specifications from the Java .class files. This has the added advantage 
that any .class file can be made into a component, not just SanFrancisco model 
fragments. 

The white class shape shows an interface. Interfaces are predefined, like com- 
ponents. Additionally, interfaces can contain code generation fragments. This 
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allows you to provide a default implementation for a Java interface. The devel- 
oper usually does not have to write any code to implement the interface. 

The tree view shows details of the model, updates can be made by double- 
clicking one of the items in the tree, which causes a property editor to appear. 
Note that the tree view shows a class, DynamicEntity, which is not present in 
the model diagram. This class is the superclass of DescribableDynamicEntity. 
It is needed to generate the correct code, but for readability reasons it is made 
invisible in the diagram. 

When the specification of the UML model is complete, code can be generated. 
BC-Prototyper provides sets of code generation templates. The main sets avail- 
able are templates for a stand-alone application such as BC-Prototyper itself, 
and to generate a SanErancisco application. When SanErancisco code generation 
is applied to the AddressBook example, a set of .java file will be generated, as 
shown in Eig. 3, and some other files, such as HTML documentation for the 
model. 

When the code is generated and compiled, and proxies are created, the next 
thing to do, is to add the name tokens for the newly defined classes to the San- 
Erancisco naming configuration. A utility is provided to perform this function. 
The SanErancisco servers will automatically be started if necessary. 






Fig. 4. The generated GUI for the AddressBook Example 

Next, the application can be started. The panels you may see in the running 
applications are the AddressBook panel and the Address panel. See Eig 4. Note 
that the fields in the Address panel are all generated from the information in 
the imported component. 
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Fig. 5. Generated GUI for BC-Prototyper itself 



5.3 GUI Generation 

BC-Prototyper generates at request of the user, not only model code which 
implements the structure of the model as defined by the user, but also a GUI, 
which can be used for later customization. 

Simplified, the rules for generating a GUI are: A class in a model maps 
to a window, optionally with a set of tab-panels. Attributes in a class map to 
a TextField, Text Area, Checkbox, Choice, or Button, with a Label if appropri- 
ate. Classes with a contained one- many relationship, will map to a List in the 
containing class. 

A simple example is the generated GUI for the AddressBook example. See 
Fig. 4. Note that we did not do anything to customize the generation. We could 
have entered nicer labels in our model, categorized the fields into more tab- 
panels, etc. The window for AddressBook contains fields which it inherits from 
DescribableDynamicEntity or DynamicEntity. If you click on the Addresses tab, 
you see the generated list for the one-to-many relationship to the Address com- 
ponent. From this panel you can open existing, or create new Address objects. 

Another good example of a generated GUI, is the user interface of BC- 
Prototyper itself. When we apply the rules to the meta-model, we will obtain 
generated windows for the class property editor and attribute property editor 
as in Fig. 5. These windows are actually used as interface to BC-Prototyper to 
enter values into the model a user creates. 

5.4 Use of Components in BC-Prototyper 

Any development tool which supports a framework like IBM SanFrancisco, 
should have a strategy to support components. 

The solution chosen for BC-Prototyper is to import information from .class 
files. This is a very general solution, it can be applied to any compiled Java file. 
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For the purpose of SanFrancisco, some of the programming model patterns have 
been applied in the re-engineering process. It was found that some information 
cannot be extracted fully from the class files (initialize (. . .) methods etc.). This 
information is added manually. 

Once a component is put on the BC-Prototyper work area, they can be used 
as any class defined by the user. You should not change the contents for the 
server part, but you can change the view specification in the component. 

In Fig. 2, you can see the list of Java source files for the Address com- 
ponent. An AddressPanel, AddressPart and AddressWrapper are generated, to 
provide client and GUI code for the component. Contrast this with AddressBook, 
a newly created Business Object, for which also AddressBook. java, Address- 
Booklmpl.java and AddressBookFactory.java are generated. 



6 The Architecture of BC-Prototyper 

The architecture of BC-Prototyper can best be characterized as a ^^spider^^ archi- 
tecture. The body of the spider consists of an object-oriented structure, which 
is an implementation of the meta-model of the tool. Its legs are filters through 
which models stored in some external format can be put into the body of the 
spider, or filters through which external representations can be generated. 

The BC-Prototyper architecture is described in more detail in [2]. Here we 
give an overview of the main items. 

The design of BC-Prototyper shows four major parts: 

1. A mini- framework for use by BC-Prototyper itself, and by the applications 
which are built with it. 

2. The spider body, consisting of generated code from the meta- model. It imple- 
ments the data structures for the model information, and the user interface 
in the form of property editors for the model. 

3. Core tool functionality, such as the code generator- template interpreter and 
model drawing routines. 

4. A set of configurable, bean-like modules, the plugin adapters, encapsulating 
menu options and/or toolbar button actions, or tab-panel functionality. An 
.ini file determines at start-up what the functionality of BC-Prototyper will 
be, by providing a list of plugins to include. 

The external file formats can include Java source files, Java class files, HTML 
documents, XML documents, .bat files and others. The spider legs are more tech- 
nically called plugin-adapters. Plugins are coded as JavaBeans. The availability 
of a plugin is determined by a configuration file, which is read when the tool 
starts. In this way, the tool can be configured for different purposes or for dif- 
ferent code generation target environments. 
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6.1 The Code Generator 

We described the GUI generation rules and the use of components in BC- 
Prototyper as functional items in the previous section. We consider code gen- 
eration an architectural feature of BC-Prototyper, because it had a profound 
influence on the way the tool was developed. 

The server-side programming model of SanFrancisco is not very complex, but 
large in size. Client code for transaction management and GUI adds considerably 
to the complexity of the code. For our experiment in architecture, a very efficient 
approach would be necessary, and we found it in using XML code generation 
templates. 

Code generation with XML templates in BC-Prototyper is an interaction 
between the contents of these templates, a template interpreter (written in Java) 
and introspection into the meta-model for in the tool. The template contents 
are Java source code (or HTML, or project files for an IDE, or SanFrancisco 
configuration files or whatever you need), surrounded with XML tags, and in 
some places the source code is replaced by XML-elements. 

To get a flavor of what these templates are like, here is a snippet of a tem- 
plate for the simplest of examples. It generates “getter” methods for all public 
attributes for a class in the model: 

<Rule> 

<Target>Attributes</Target> 

<Condition> scope = public </Condition> 
public &type; get&u.name ; () { 
return iv&u . name ; ; 

> 

</Rule> 

The u. is a shorthand for putting the first character of the replacement value in 
uppercase. Further, the snippet should be self-explanatory. For example, if the 
name of one of the public attributes for an Address class was “city” , and its type 
“String” , then this template would generate: 

public String getCityO { 
return ivCity; 

} 

The effect the adoption of this code generation technique had on Business Com- 
ponent Prototyper was large. Not only was the author able to complete a code 
generator for SanFrancisco in a very short period, with more functionality than 
any other tool could provide at the time. The author was also able to develop 
BC-Prototyper itself into a professional looking tool in the same short period. 
Why? Because changes to the meta- model of the tool, and to the code generator, 
are immediately available to the tool itself. And the tool is developed using the 
tool. This had a snow-ball effect on the tool development. 
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The technical reason that XML in conjunction with introspection in Java is 
very effective, may be the following: 

— Generated source code or documentation is a ffat, serialized representation of 
an object structure. The mapping of the object structure to a ffat representa- 
tion can be fairly complex. An XML template is an implicit implementation 
of this mapping. When the mapping is already done, the generator can be 
simple. 

— XML templates are readable and editable by humans. By changing the XML 
templates, you can change the mapping, and therefore the generated code, 
without having to re-code or recompile the code generator. New generators 
are easily made by adding tags to existing source code or documentation. 
Users can specify their own code generation, or customize templates provided 
easily. 

Part of the effectiveness of the tool development is also its “spider” architec- 
ture. The body of the spider contains the generated code from the meta-model, 
and is easily adaptable by adapting the meta-model or the code generation tem- 
plate. Any number of legs can attach to the spider body through a simple bean 
interface. The legs can contain filters to import /export XML documents, Java 
code etc., to/from the spider body. 



6.2 A Look Back at the Wish List 

When we match the functionality and architecture of BC-Prototyper to our wish 
list, described in Section 4.1, we find that many of our wishes are indeed fulfilled: 

— BC-Prototyper runs on all Java platforms because it is written in Java 

— It certainly does not provide full UML support. Its extensible architecture 
would allow for addition of Object Interaction Diagrams, State Diagrams, or 
any other UML feature quite easily. To develop BC-Prototyper into a full tool 
instead of a prototype, these functions must be added. The code generation 
for SanFrancisco application development totally depends on a static object 
model, which is provided with BC-Prototyper, as shown in Fig. 2. 

— Java development support is primitive. The same argument applies here as 
for the previous point. 

— BC-Prototyper does support components. And it does so in a more trans- 
parent way than other tools the author is aware of. 

— XML is used both for storing the model itself, and for encoding code gener- 
ation templates. 

— A knowledgeable user could adapt the meta-model or the generation tem- 
plates. In this way the tool can be customized and extended. 

— Technology changes can be cared for in the same way as the previous point, 
as long as Java is in the center of it. 

— BC-Prototyper is one integrated tool, but configurable, by deciding which 
plugin components will be included at startup. 
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— We did not mention it in the functional description, but changes can be 
preserved by the tool. Generated code can be imported into any other IDE, 
changed, and imported back into BC-Prototyper. Code changes will be put 
back into the model at the right spot, provided the user has indicated changes 
with a change tag. 

— Integration with SanFrancisco is provided for as described in Section 5. 

7 Conclusion 

This article described SanFrancisco, a new large, Java based component frame- 
work. We looked at the architecture of SanFrancisco, and derived from it the fea- 
tures an effective development tool for Business Component frameworks should 
have. 

We found that, although not complete yet. Business Component Proto- 
typer can provide the necessary features to become a good development tool 
for Java framework applications in general and for SanFrancisco in particular. 
BC-Prototyper has functionality which is usually not found in other develop- 
ment tools, such as generation of complete, running prototypes, including GUI, 
and tight integration with SanFrancisco. 
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Abstract. The World Wide Web is likely to become the standard plat- 
form for future generation applications. Specihcally, it will be the uniform 
interface for sharing data in networks, both Internet and intranet. From 
a database point of view, there are two main directions of interest: the 
first one is related to the extraction of information from the Web (pos- 
sibly in order to manage it by means of database techniques and tools) 
and second to the database support (again in terms of both methods 
and tools) to the management of data to be offered through the Web. A 
major, preliminary issue is the method used for the description of data, 
where the discussion between structured and semistructured approaches. 
Interesting developments will probably by stimulated by the advent of 
XML. 

1 Introduction 

It is now generally agreed that database research produced very interesting 
results, which provided the basis for a very solid technology (interesting reports 
on the issue have appeared in the last ten years: Silberschatz et al. [39,40] and 
Bernstein et al. [12]. Databases were conceived and developed back in the 1960s 
and 1970s in order to support the integrated management of data in business 
applications, such as banking, reservations, personnel, inventory management. 
In a few words, we can say that the goal of database technology is to produce 
systems for the management of large volumes of shared and persistent data 
with reliability (ensuring that data is not lost) and power (in terms of both 
efficiency, say, throughput and response time, and effectiveness, say, usability 
and flexibility). Major achievements in this were respect were obtained in the 
1980s, with the development and acceptance of relational systems, which offer 
high-level languages and an associated technology for reliable and efficient trans- 
action management, both in centralized and distributed frameworks. Properties 
of databases are now duscussed at length in textbooks, such as Atzeni et al. [3] 
or ElMasri and Navathe [19]. 

In the last decade, it has been recognized that the scope of database tech- 
nology is much larger than the basic business world that supported its initial 
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development (see the aforementioned reports [39,40] for extensive discussions). 
The continuous reduction in the price/performance ratio for all hardware com- 
ponents has made it possible to apply database ideas to many applications, with 
more complex data, such as text, images, video, and scientific data sets. Then, 
the explosion of the Internet and of the Web has produced an even bigger spur 
to database technology: the Web offers the potential for integrating data from 
an enormous number of sources and for connecting huge numbers of clients (not 
only human users, but small powerful tools embedded everywhere, say in cellu- 
lar phones, cars, domestic appliances). In this framework, the recent so-called 
“Asilomar report” (Bernstein et al. [12]) recommends the following as a ten- year 
goal for the database research community: “The Information Utility: Make it 
easy for everyone to store, organize, access, and analyze the majority of human 
information online.” With the assumption that the majority of human informa- 
tion will be on the Web within a decade, this would mean that it will be through 
the Web (not necessarily today’s Web, but what the Web will be by that time) 
that all this should happen. In synthesis, we could say that information systems 
will be “Web-based,” in the sense that the Web will one of their interfaces (pos- 
sibly the major or even the only one). In fact, the notion of a “Web information 
system” has been proposed, mainly to stress the original features and require- 
ments that are emerging (Isakowitz et al. [28] present a collection of articles on 
the topic). 

Bernstein et al. [12] mention a number of general themes of interest for future 
research in the database field: 

— development of “plug and play” database systems, aimed at simplifying the 
management and administration and at supporting the understanding of the 
available information, and the integr ability of autonomous sources; 

— scalability of the approaches to federated database systems (conceived for 
the cooperation of a few systems, but now potentially involving thousands or 
even millions); this refers to both performance issues and semantic aspects, 
for example in the management of heterogeneous or irregular information; 

— general revision of the overall architecture of database systems, because of 
the evolution of hardware and of the growth of the size of databases; 

— better integration of data and processes, both with respect to conventional 
applications and to newer approaches, such as workflows and business rules; 

— integration of structured and semistructured data, mainly as a consequence 
of the advent of XML. 

Most of these themes include aspects that are directly related to the specific role 
databases have in WIS. 

In this paper, we will try to point at some of the activities in the research 
field that tries to consider database methods and techniques applied to WIS, 
as well some of the lines currently being pursued. A word of caution is needed: 
given the high dynamicity of the field, we do not aim at a complete survey — we 
will mainly discuss the issues on the basis of the experience we had at Universita 
Roma Tre within the Araneus Project (Atzeni et al. [6,7], Mecca et al. [33,35]). 
For other references on the subject, we mention a survey by Florescu et al. [22] 
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and the proceedings of two recent workshops (Atzeni et al. [9] and Cluet and 
Milo [15]). 

We will begin by discussing in Section 2 the idea of Web information systems. 
Then, the major research issues will be described in Section 3, and specifically: 
in Section 3.1 we will discuss the role models play in this field, in Section 3.2 we 
will consider the problem of extracting data from the Web and in Section 3.3 
the converse problem of designing and maintaining a Web site. We will conclude 
in Section 4 by briefly mentioning the impact XML might have on the field. 



2 Web Information Systems 

As we said in the introduction, it is probably the case that in a few years all 
information systems will be Web-based. At that point, the notion of Web infor- 
mation system could become so pervasive to be irrelevant (in the sense that no 
system would be outside this category). However, we believe that it deserves 
our attention now, for at least two reasons. First, the evolution of Web-based 
systems from simple, hand-produced sites to complex information systems has 
been gradual, and the various stages show some individual interest, in terms of 
requirements, methods and tools. This is especially true with respect to data 
management issues. Second, the Web has offered opportunities for new systems 
and services that would not have been possible in other environments. 

In the same way as information systems are sometimes classified on the basis 
of their features, we believe it can be useful to classify WIS according to the 
complexity of the involved data and processes as shown in Figure 1 (Mecca et 
al. [35]). We discuss this classification by referring also to the way Web sites have 
evolved in the few years that have passed since its invention (see also Atzeni et 
al. [3, Section 14.2]). 

Indeed, the first Web sites were composed of manually- crafted hypertexts, 
prepared with the goal of making information available. Pages were generally pre- 
pared in an ad-hoc way, possibly using information already available (although 
often to be reorganized). They would usually fit within the lower- left category 
{Web-presenee sites, with low complexity of both data and processes). Many 
sites with these features still exist, mainly with marketing or basic advertising 
goals. In some cases, they are used as “entrance gates” to other systems. 

Following the initial goal of the Web, the complexity of the data to be dis- 
seminated grew, and many eatalogue sites (upper left part of the diagram) 
appeared. In this framework, it was soon realized that it could be useful to 
store in databases the information to be “published” on the Web, for a number 
of reasons: on the one hand because the database could already exist (or at least 
it could be created with multiple objectives, one being publication on the Web); 
on the other hand, a database could support the management of changes in data 
(especially if the structure and appearance of the hypertext stay unchanged) as 
well as a non-redundant support for redundant data (the Web can be seen as 
a “hypertextual view” over the database, and it is often the case that redun- 
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Fig. 1. Classification of Web Information Systems 



dancies appear in the Web, at least to provide support for navigation, whereas 
databases tend to avoid them). 

At the same time, the Web spurred the development of many services, along 
the lines already being pursued in the Internet world: the Web offered a much 
more friendly interface, and therefore there was an immediate explosion of initia- 
tives. Services of various types appeared: from spontaneous discussion groups to 
search engines to free email providers. Here (despite the fact that there is often 
a database in the background) the visible complexity is only in the processes: 
we are in the lower-right portion of Figure 1. 

The route towards fully-fledged WIS (upper-right part) is still to be com- 
pleted, and database research has considered only in a marginal way the com- 
plexity of processes and applications. Indeed, attention has been mainly on fea- 
tures that extend, to the Web world, database services that are popular in tradi- 
tional information systems. However, some steps have been made. A significant 
category is represented by many catalogue sites that allow for updates and trans- 
actions, at least from two points of view. On the one hand, modifications to the 
catalogue are often allowed by means of pages similar to those used for pub- 
lication. This is not a minor point as it might appear at first sight, because 
content maintenance is a common problem in many Web sites (especially from 
the organizational point of view) and also because this was a first step toward the 
idea that most (if not all) interfaces could be Web-based. Usually, this feature 
remains within the domain of the organization that owns the site, but can be 
physically distributed, without any difficulty. A second, more glamorous, direc- 
tion of extension for catalogues has been the growth of “electronic commerce,” 
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with a tremendous impact on the popularity of the Web, providing an indica- 
tion of how the Web could change the way many activities are conducted (an 
interesting collection of articles has recently been edited by Dogac [18]). Indeed, 
there are two major forms of electronic commerce: the most popular among indi- 
vidual Internet users is probably that related to catalogue sales (which could be 
seen as an evolution of traditional mail order services), often referred to as busi- 
ness to customer (B2C), whereas the most important with respect to volumes 
of exchanges and amount of money is that involving organizations at both ends, 
business to business (B2B). 

More general issues related to WIS are still emerging, and we believe that, 
for the time being, it would be difficult to indicate general topics, except for a 
comprehensive need for the integrated management of data and processes. This 
is indeed a recurring theme in database research, as it was one of the motivations 
for the development of object-oriented databases (see for example Bancilhon [10]) 
and of active-rule components for databases (Widom and Ceri [42]). As we men- 
tioned in the introduction, the Asilomar report [12, Section 3.4] also emphasizes 
this requirement, indicating topics such as workflow support (of business rules), 
component models (such as CORE A, OLE, Jini), visual programming method- 
ologies, and persistent programming languages. 

A major aspect that needs to be mentioned is the fact that the Web (and 
more generally the Internet) provides access to many autonomous systems, which 
could interact with one another. Indeed, the idea of “cooperative information sys- 
tems” has been proposed with reference to existing systems that cooperate (see 
De Michelis et al. [17]); well, in the Web-world we can say that, in principle, 
all systems cooperate. Cooperation can be loose (as extreme, lists of links to 
autonomous sites or services) or tight (integrated interfaces to heterogeneous 
systems, with differences made transparent). For example, the B2B form of elec- 
tronic commerce requires a form of cooperation between the information systems 
of the involved enterprises. Similarly, most on-line catalogues (in a B2C system) 
are indeed gateways towards multiple autonomous systems. 



3 Databases in WIS 

A feature that is common to most of the applications we mentioned in the previ- 
ous section is the need for an integrated management of data of various nature, 
from traditional “database-data” to data embedded in documents (for example 
HTML ones). We have introduced the term Web-bases (Mecca et al. [33]), as 
collections of data of heterogeneous nature, with various degrees of semistruc- 
turedness (see Suciu [41] for references and discussion), from highly structured 
to unstructured. More generally, the Web involves many information sources, 
which could be more or less structured, and the major need is to let these 
sources cooperate. In the process of organizing cooperation, if we mainly refer 
to the complexity of data, a crucial issue that arise is the exchange of data, 
and its transformation, for example from a structured form (say, a database) to 
an unstructured one (Web pages) or viceversa. In other words, an interesting 
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goal could be the following: “treat database data as if it were Web hypertexts, 
and viceversa.” Indeed, databases and hypertexts have each its own advantages 
and disadvantages: in some cases navigation over hypertexts is what users need 
(and may be data is in non-browsable databases), in other cases users need to 
correlate, sort and aggregate (in a word, “query”) data (and they have them 
embedded in documents). Therefore, a fundamental feature that is needed for 
the management of Web-bases is the support to translations: from Web sites 
to databases and viceversa. We have already argued for the need of the former 
when we said that most Web sites (catalogues, for example) are supported by 
databases; as regards the latter, let us just mention the issue of Web farming 
(discussed at length by Hackatorn [25]): this is the process of feeding data ware- 
houses, that is, special databases used for decision support (see Inmon [27]). 
Therefore, we will devote our attention to the two basic steps: 

— from Web-hypertexts to databases: querying and extracting data from the 
Web; 

— from databases to hypertexts: generating and managing Web sites. 

With respect to Web sites, the two steps could be described as bottom- up 
(extracting data) and top-down (producing HTML pages), respectively. 

A third major aspect often cited (Mecca et al. [33], Florescu et al. [22]) 
in this framework is that related to integration. We have already mentioned 
its importance; however, we believe that here the challenges specifically related 
to the Web are mainly due to the transformations from one form to another, 
rather than to integration itself (which is a complex problem indeed, regardless 
of whether the sources are on the Web or not). Therefore, for the sake of brevity, 
we will not comment more on this issue; the interested reader can consult the 
above sources, especially the survey by Florescu et al. [22], for more information. 



3.1 Data Modeling in WIS 

A general point to be discussed here is the role abstraction plays in this frame- 
work; abstraction is a major feature in the database field, as it forms the basis 
for modeling and therefore describing the common features of similar database 
items. As we mentioned in the previous section, data in the Web is often unstruc- 
tured or semistructured, in the sense that it shows little or no regularity. As a 
consequence, the Web has sometimes been described by means of a very simple 
data model, graph based (Mendelzon et al. [36]): Web pages are the nodes of 
the graph and hypertextual links are its edges; all internal structure (if any) 
is ignored. This is clearly possible in all cases, but does not help much in the 
organization of the process of extraction of data from Web sources. In fact, it is 
often the case that Web sites do show some degree of structure. At the oppo- 
site extreme, one can see that some Web sites show (at least in some of their 
pages, or in portions of them) well structured pieces of information (for example 
organized as tables). There are variuos reasons for having more structured sites: 
from the owner point of view, this may just be the consequence that the site is 
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generated automatically by means of programs, and therefore the organization 
of pages is regular; the motivation can also be deeper, and in the user’s interest: 
regularity in the structure helps in the navigation process. With this in mind, 
in the Araneus project (Atzeni et al. [6]) we found that it would be useful to 
have a model for the description of Web sites. Initially, this was conceived as a 
basis for querying: in fact the idea is that structure (and data models) can be 
used in as a documentation of a site (to support the user’s understanding of the 
content), as a basis for querying (from the user point of view) and as a basis for 
query optimization (from the system point of view). Soon after that, we realized 
that modeling could be useful also in the process of generating a site with some 
regularity in the structure (as it is often needed in catalogue sites, as we said 
in Section 2). While referring the reader to our papers (Atzeni et al. [6], Mecca 
et al. [33]) for examples, we comment here on the major features of our model, 
called the Araneus data model (adm). It is a “complex-object data model,” in 
the spirit of ODMG or SQL3. Its major construct is the page seheme, whose 
instances, the pages, are essentially complex objects, with the as identifier and 
simple or complex attributes. Simple attributes can be ordinary values (num- 
bers, text, images, etc.) or links, which are loosely typed (in the sense that the 
destination page can belong to one of a set of given page schemes). Complex 
attributes are tuples or lists. The model also provides constructs for modelling 
Web-specific features, such as maps and forms. 

Other authors have advocated the adoption of semistruetured data models: 
this is an idea that was originally conceived as the basis for the integration of 
heterogeneous data sources (Papakonstantinou et al. [37]) and makes use of a 
representation of data by means of labeled directed graphs (see Abiteboul [1] 
and Buneman [13]), where labels represent “attribute names,” in database ter- 
minology, and there is no restriction on the labels; therefore, at least in principle, 
there is no schema on the data. An advantage of this approach is that all sets of 
data, no matter how irregular, can be easily represented. The main disadvantage 
is that there is no way to exploit regularity, when there is some. We believe that 
there is no definite answer on whether a structured or a semistruetured approach 
is to be preferred, in general. The major issue is to be able to take advantage of 
regularity and structure whenever possible. 

As we will discuss in the next subsections, we propose the use of a data model 
both in the top-down and in the bottom- up step. It is important to note that, 
in the latter case, the model can be used an abstraction tool, in the sense that 
not all aspects need be represented: this is obvious if we refer to graphical or 
presentation features, but can also be important if there are details (possibly 
unstructured) that turn out to be marginal (or may be difficult to model): they 
can simply be ignored. 



3.2 Prom the Web to Databases: Extraction and Querying 

Various approaches have been proposed to query the Web and to extract informa- 
tion from its sites. The first proposals (Konopnicki and Shmueli [30], Mendelzon 
et al. [36], Lakshmanan et al. [31]) considered the Web according to the graph 
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model, as we discussed in the previous section, and therefore allowed conditions 
on paths on the graph as well as simple operations (or calls to external pro- 
grams) within pages. The major limitation of these proposals was the difficulty 
to take into account the internal structure of pages. Also, the results of queries 
were essentially lists of pages. Subsequent proposals (including WebOQL, by 
Arocena et al. [2], StruQL, by Fernandez et al. [20], and our proposal Ulixes, 
Atzeni et al. [4,6]) aimed at taking the internal structure of pages into account 
and provide a structure also for their results. 

The crucial point here is not related to deciding which could be the best 
model or the best language, but to find means for a suitable mapping from Web 
pages to the language itself: the point is that we need to transform unstructured 
(or at best, loosely structured) pages in order to be able to extract from them 
the pieces of data that are embedded. This is the task of wrappers (Hammer et 
al. [26]). In our approach, querying is based on ADM, which is a logical model, 
and wrappers are needed in order to map logical accesses to attribute values 
in ADM objects to physical accesses to HTML text. With respect to wrappers, 
after a first proposal based on a procedural language inspired by editor based 
operations, such as “cut and paste” (Atzeni and Mecca [5,32]), we resorted to a 
grammar-based formalism with exceptions (Crescenzi and Mecca [16]): it allows 
a declarative description of pages (using a context-free grammar), but can handle 
irregularities by means of an exception handling mechanism. This has turned out 
to be a reasonable compromise in the description of sites. 

At this point, the query process can be formulated by means of a query 
language whose syntax resembles known SQL-like syntax. The result, in our 
current prototype, is a relational table, but extensions to a complex-object model 
could be easily produced, but they would not add much to the major contribution 
of the approach, which is in the possibility of extracting “database data” from 
hypertexts, and therefore the major feature is wrapper technology. 



3.3 Prom Databases to the Web: Generating and Managing Sites 

With resepct to the “top-down” problem, generating Web sites from database, 
it is important to say that there are now many tools available on the market. 
However, we do believe that tools mainly provide very low-level support (in 
terms of HTML development) or, at best, simple mappings that allow to build 
Web interfaces to database data, but with no systematic support, especially with 
respect to the overall organization of the site. 

It turns out that most Web sites do not satisfy user needs: the informa- 
tion kept is poorly organized and difficult to access; also it is often out-of-date, 
because of obsolete content and broken links. In general, this is a consequence of 
difficulties in the management of the site, both in terms of maintaining the struc- 
ture and of updating the information. Many Web sites exist that have essentially 
been abandoned. 

We believe that this situation is caused by the absence of a sound methodolog- 
ical foundation, as opposed to what is now standard for traditional information 
systems. In fact, Web sites are complex systems, and, in the same way as every 
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other complex system, they need to be developed in a disciplined way, orga- 
nized in phases, each devoted to a specific aspect of the system. In a Web site, 
there are at least three components (see Atzeni et ah [8], Fernandez et ah [21], 
Ceri et ah [14]): the information to be published (possibly kept in a database); 
the hypertextual structure (describing pages and access paths); the presentation 
(the graphical layout of pages). These issues have led us to develop a method- 
ology (Atzeni et al. [7]) that is based on a clear separation among three well 
distinguished and yet tightly interconnected design tasks: the database design, 
the hypertext design, and the presentation design. Now, it is widely accepted 
that data are described by means of models and schemes, at various levels, for 
example conceptual, with the ER model and logical level, with the relational 
model (see Batini et al. [11] for a description of design methods). In a data- 
intensive Web site, it is common to have large sets of pages that contain data 
with the same structure (coming from tuples of the same relation, if there is 
a database): therefore, as we have already argued, we believe that models are 
useful description of hypertexts. We have already briefly illustrated ADM, to be 
considered a logical model for hypertext (the counterpart, in this framework, of 
the relational model for data). However, given the various facets that can arise, 
we believe that hypertexts also need, in the same way as data, to be described at 
a higher, conceptual level. Therefore, our methodology makes use of a conceptual 
model for hypertexts (called NCM, the Navigation Conceptual Model) ^ which is 
essentially a variation of the ER model suitable to describe hypertextual features 
in an implementation independent way. In summary, the various phases of the 
methodology are shown in Figure 2 with the precedences among them and their 
major products (schemes according to appropriate models). 

It is worth noting that other proposals have been recently published that 
present some similarities, though with a less detailed articulation of models (Fra- 
ternali and Paolini [23], Fernandez et al. [21]). The origins of the methodological 
aspects can be traced back to previous work on hypermedia design (Garzotto et 
al. [24], Isakowitz et al. [29], Schwabe and Rossi [38]). Most of these approaches, 
including ours, also provide languages, or at least tools, for the generation of Web 
sites from the declarative description of its structure, by supporting the mapping 
from the database to the site. This direction of mapping is clearly easier than 
that discussed in the previous section with respect to wrapping, because every- 
thing is under control. The tools also support the separation between structure 
and presentation, by providing some forms of template pages that describe the 
common appearence of pages with the same structure (in our approach, all the 
pages with the same page schema [9]). 

4 The Next Step: XML 

We conlude by briefly noting that the recent introduction of XML, which could 
emerge as a new standard for data representation and exchange, is stimulating 
a lot of interest in the database research community. Two major features are 
relevant here: first, XML offers (by means of DTDs) the possibility of describ- 
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Fig. 2. The Araneus Design Methodology 



ing the content of Web pages more accurately than HTML; second, XML pro- 
vides a separation between content and presentation (for example by means of 
XSL, extensible Stylesheet Language). As we have recently discussed (Mecca et 
ah [34]), both these features can be exploited in an interesting way as extensions 
of current approaches. In particular, DTDs and descriptions of sources can be 
very useful in supporting the automation of the wrapping process. However, a 
major problem would remain, the need for understanding structure and using 
the structure provided by by an external, autonomous source. Also, the difficulty 
of choosing between a structured and a semistructured approach would not be 
reduced. 
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Abstract. There is undoubtedly a huge gap between the level of for- 
mality currently in use in mainstream software engineering and the 
“best practise” advocated by academics and practised by a small sector 
of industry involved in critical applications. This paper presents some 
results of recent research which are building bridges between these two 
approaches: on the one hand, developing formal methods which are useful 
to mainstream developers; and on the other, underpinning mainstream 
methods with formal foundations. 



1 Introduction 

Formal Methods are now becoming an established technology for software devel- 
opment in several sectors of the industry particularly for systems which are 
critical with respect to safety or finance. Particular domains where there has 
been a significant uptake are critical instrumentation and control systems in 
military and civil avionics systems, terrestrial transportation systems, nuclear 
control systems, and space systems. Although this is a large industrial market 
sector, estimated as 3 Billion Euro annually within Europe, there is still much 
work to be done if formal techniques are to become a major force in mainstream 
industrial software development. 

The formal techniques which have been the subject of the research described 
in this paper have been available for some years, but technology users have 
been reluctant to fully adopt what has been offered on the market, as the sup- 
plied technologies lacked sufficient track record of use, were perceived as difficult 
and costly to use, and required highly specialised mathematical expertise in the 
development team. 

In this paper, I give a synopsis of the results of a number of research projects 
which have made some progress towards providing evidence of the benefit of 
following the formal approach. This evidence concerns the quality of the delivered 
system and the cost of its development and is both qualitative and quantitative 
in nature. 

The formal methods considered cover a variety of techniques which can be 
summarised as follows: 

— the use of Mathematical Models for specification of software, 

— the use of Proof for calculation of properties of the specified software. 
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— the use of Stepwise Refinement to ensure the preservation of these propeties 
by the design decisions made during development, and 

— the generation of Test Cases from specifications to complement formal proof 
in verification of the implementation. 

The formal methods employed in the projects covered by this paper are 
primarily the “model-oriented” formal methods including: 

— VDM [33], the Vienna Development Method originally developed at IBM 
research laboratories in Vienna during the 1970s, and then consequently 
at Manchester University, the Technical University of Denmark, RAL and 
IFAD. 

— Z [42], developed at Oxford University during the 1980s, and 

— B [1], also emanating from Oxford and then developed in several industrial 
organisations including BP and GEC Alstrom during the 1980s and 1990s. 

This paper is outlines three lines of research undertaken in the author’s group 
at RAL, presents some results from one of those lines, and suggests areas for 
ongoing further work. The next section identifies the three key lines of research 
and outlines a number of the research projects undertaken over the last decade. 
Then Section 3 presents two of these projects in more detail and key results from 
some from them. The last section describes some ongoing work and draws some 
conclusions. 

2 Three Converging Lines of Research 

This section outlines a number of research projects and identifies three areas of 
research which are key to enabling the uptake of formal methods. The three lines 
of research are: 

1. the Advaneement of Formal Methods through the development of tools and 
techniques supporting particular methods, 

2. the Teehnology Transfer of Formal Methods through their application in 
developments in collaboration with industry, and 

3. the Formalisation of Industrial Methods^ in particular diagrammatic meth- 
ods for Object Oriented Analysis and Design. 

These three lines of research contribute respectively to the practical feasibility 
of using formal methods in real-life software development; to the body of evidence 
for the costs and benefits of adopting them; and to the reduction of the level of 
specialisation required by the personel employing them. 

The projects covered are summarised in the following sections. 



2.1 Advancement of Formal Methods 

The first line of research is concerned with the advancement of formal methods 
themselves and the devlopment of technology to support their use. These projects 
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developed tools and standards for formal methods, both prerequisites to the 
industrial uptake of a technology. 

The Mural project (1986-1990) developed a VDM specification support 
tool and proof assistant. Working with Cliff Jones’ group at Manchester Univer- 
sity, this project produced a tool supporting VDM developments and a generic 
proof assistant supporting the construction of proofs arising in those develop- 
ments [23,25,24]. The resulting tool was among the first to a use graphical user 
interface to a theorem proving assistant and has since inspired a number of 
projects involving user interface design for theorem provers. 

Proof in VDM. The Mural project led to the definition of an axiomatic 
semantics for VDM and a “Practitioner’s Guide” to proof using that seman- 
tics [15]. A number of case studies in proof using that axiomatic semantics were 
also developed [7]. It also led to a line of reserach into the use of read and write 
frames in operation decomposition [9,10,8]. 

The ZIP project was a collaboration between British Aerospace, British 
Petroleum, IBM, Logica, Praxis, the University of Oxford and the Rutherford 
Appleton Laboratory. This project developed a semantics and proof theory for Z, 
which forms the basis of the current Draft of the British and ISO Standards for Z. 

Verification Techniques for LOTOS (1990-1992) was a study of the the- 
ory and practise of the verification of specifications in the ISO standard LOTOS 
language. RAL’s contribution to this project was in the development of the 
ERIL term rewriting formal reasoning system [37] and in undertaking a case 
study which used LOTOS to specify parts of the graphics standard GKS [29]. 
The other partners in the project were Royal Holloway and Bedford New College, 
Glasgow University, and British Telecommunications. 



2.2 Technology Transfer of Formal Methods 

The second line of research concerns the technology transfer of formal techniques 
into industrial practice. This has been achieved through three industrial collab- 
orative projects which have employed and assessed the VDM and B methods. 
These projects provide evidence of the cost/benefit of the use of formal methods 
necessary to reduce the risk of their uptake. 

The B User Trials project (1992-1995) was the first UK industrial project 
involving the B-Toolkit. It was a collaborative project between RAL, Lloyds 
Register of Shipping, Program Validation Limited and the Royal Military College 
of Science and played a major part in bringing the B-Toolkit up to industrial 
quality. RAL undertook two case studies [22,41] comparing B with VDM and Z 
respectively, and investigated the methodology employed in the construction of 
proofs [19,26]. A summary of the project appears in [11]. 

The MaFMeth project (1994-1995) was the first project to bring VDM 
and B together to exploit their different strengths. A collaboration with Bull 
Information Systems, it was “application experiment” assessing a methodology 
covering the whole life cycle combining the use of VDM for early lifecycle devel- 
opment with B for refinement and code generation. It demonstrated the com- 
mercial viability of the use of formal methods by collecting quantitative evidence 
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of the benefits in terms of fewer faults made in development and early detection 
of faults. Qualitative experience is reported in [14] and a quantitative analysis of 
the results in [13]. In particular it was found that animation, test case generation 
and proof are all cost-effective ways to find faults in formal texts [12]. 

The Spectrum project (1997) was a feasibility study into the commercial 
viability of intergating the VDM-Toolbox and B-Toolkit. The evaluation was 
undertaken from three perspectives: the industrial benefit of using the combined 
tool, the technical feasibility of the combination of the two tools and the com- 
mercial case for the development of a combined tool. RAL’s partners were GEC 
Marconi Avionics, Dassault Electronique, Space Software Italia, the Commis- 
sariat a I’Energie Atomique, the Institute of Applied Computer Science (lEAD) 
and B-Core UK Ltd. 

Eurther details of the MaEMeth and SPECTRUM projects are given in the 
Section 3. 



2.3 Formalisation of Industrial Methods 

These projects concerned providing some formal underpinning to techniques 
already established in industry, thus providing a low-cost entry route to the 
uptake of formal methods. 

The TORUS project sought to raise the level of reuse in industrial process 
control projects, by defining and deploying reuse based work processes together 
with supporting tools, thus improving the efficiency over the whole design life- 
cycle. the project employed the ISO standard data modelling language STEP- 
EXPRESS. As part of the work, we undertook investigations into the formal- 
isation of this langauge, and made suggestions for clarifications to its seman- 
tics [20,18]. The project was in collaboration with Cegelec Projects Ltd., ELS 
Automation, and Alcatel ISR. 

Formal Underpinning for Object technology (EUOT) This research 
with Imperial College London and the University of Brighton undertook the 
formalisation of the conceptual basis of diagrammatic Object Oriented Analysis 
and Design notations such as UML using the “Object Calculus” as a semantic 
framework [17]. It also analysed the notations and typical development steps in 
order to suggest improvements to make these techniques more sound scientifi- 
cally. The work also raised some issues about subsystems which are the basis of 
ongoing research [16] (see Section 4.2). 



3 Results of Technology Transfer Projects 

This section presents in more detail one strand of research undertaken in the 
second of these research lines: the technology transfer of formal methods. It 
describes two projects, MaEMeth and Spectrum which have both been concerned 
with the assessment of the use of a combination of the VDM and B Methods. 
The next section presents some key results from those projects. 
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3.1 Heterogeneous Development Using VDM and B 

VDM and B are two of the most industrially used formal methods. Both are 
model-oriented methods for the development of sequential systems based on first 
order calculi and set theory. Both have a set of proof rules defined for formal 
verification and validation. Both have a formal semantics: for B this is defined 
in terms of weakest preconditions, for VDM it is denotational. Both have been 
used for a variety of applications and are supported by commercial toolkits. 



VDM — The Vienna Development Method VDM’s origins he in the def- 
inition of programming language semantics in the 1970s, but for many years 
it has been used in systems development generally [33], and there is now an 
ISO standard [32] of the specification language VDM-SL. It has a rich set of 
data type constructors, augmented by invariant predicates. Functions and state- 
transforming operations can be defined explicitly using a large expression and 
statement language or implicitly in terms of precondition and postcondition 
predicates. 

The VDM-SL standard includes a denotational semantics [36]. The semantics 
is based on the three- valued Logic of Partial Functions [34] which explicitly deals 
with definedness of expressions and requires the demonstration of well-typing for 
the substitution of equals. In particular, the strong type system supports static 
detection of many well-formedness errors. A published proof theory, described 
in [33] and in greater detail in [15], supports the validation of VDM-SL specifi- 
cations through the discharge of proof obligations. 

One area of weakness in VDM relative to B is its lack of generally agreed 
large-scale structuring. The standard contains a “informative annex” describing 
several alternative approaches to modules, including one implemented within the 
IFAD Toolbox, and other structuring proposals exist [31]. 

The IFAD VDM-SL Toolbox [30] is an industrial strength commercially avail- 
able tool which supports the ISO VDM-SL notation. The Toolbox includes a 
syntax checker, static semantic checker, and a pretty printer generating LaTeX 
output. In addition, it contains a debugger, an interpreter, and a C-h+ code gen- 
erator. It is also possible to perform test coverage analysis of specifications. An 
earlier tool for VDM, VDM Through Pictures [27] developed by Bull Informa- 
tion Systems, which supported the generation of VDM “skeletons” from Entity- 
Relationship style diagrams was used in the MaFMeth project. 

Areas to which VDM has recently been applied include railway interlocking 
systems, ammunition control systems, semantics of data flow diagrams, message 
authentication algorithms, relational database systems and medical information 
systems. A directory of VDM usage examples is available [45]. 



The B Method The B-Method [1] represents one of the most comprehensive 
formal methods currently being promoted as appropriate for commercial use. 
A development of VDM and Z, Jean- Raymond Abrial originated B whilst at 
the Programming Research Group at Oxford University in the early 1980s, and 
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subsequently at British Petroleum Research (BP) and DIGILOG. Supported 
forms are now commercially available from B Core Ltd. and Steria Technologies 
de rinformation. 

The B-method employs the Abstract Machine Notation (B-AMN) which uses 
a notion of generalised substitution to represent state transformations, a style 
of specifying operations which is more “natural” to the programmer than the 
pre/post predicates of VDM. The B-method also has powerful structuring mech- 
anisms based on a notion of Abstraet Maehine which offers data encapsulation 
allowing modular design and development of systems. 

B’s underlying semantics is grounded in weakest preconditions over untyped 
set theory and classical logic; the type system is correspondingly weak, and the 
distinction between type-checking and proof is blurred. 

The B-Toolkit [6], developed by BP and subsequently by B-Core UK Ltd 
focuses on rigorous/formal design by supporting refinement from abstract spec- 
ification through to imperative code. Tools exist for supporting static analysis 
(type-checking), dynamic analysis (animation), design documentation, proof of 
refinement and code generation. Another support system for B is Atelier-B [4], 
provided by Steria Mediterranee which allows similar functionality to the B-Core 
Toolkit. 

Examples of the use of B include the development of communication secu- 
rity protocols, subway speed control mechanisms, railways signalling, executable 
database programs and IBM’s GIGS product. A directory of B is maintained 
at [5]. 



Co-use of VDM and B Although VDM and B have the same expressive 
power in theory, a comparison undertaken during the B User Trials project 
observed [22] that VDM encourages a style of specification where implicit invari- 
ants and explicit frames are employed with postconditions to describe operations 
as abstractly as possible, whereas the representation of operations with explicit 
invariants and implicit frames employed in B encourages overspecification and 
the introduction of implementation bias reducing possible non-determinism. This 
difference arises from the different focus of the two methods and has led to the 
development of different functionality in the supported forms of the methods. 
Figure 1 and Table 1 show the different phases of the lifecycle favoured by the 
two methods and the complementary features currently provided by each toolkit. 



3.2 MaFMeth: The Measurement and Analysis of a Formal Method 

The combination of VDM and B was first explored by RAL and Bull Informa- 
tion Systems in the MaFMeth Project [13,12]. This project developed part of a 
transaction management system using VDM for the initial design and analysis, 
and then translating into B for development and code generation. 

The formal development was part of the second release of an application inte- 
gration product of the type often known as “middleware” which allows applica- 
tions to communicate in a number of ways via a single application programming 
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Fig. 1. The lifecycle identified for heterogeneous development using VDM & B 
Table 1. The complementary functionality of the VDM and B Toolkits 



Task 


IFAD VDM-SL Toolbox 


B-Toolkit 


Requirements capture 


X 


X 


Visualisation 


X 


X 


Abstract Specification 


V 


X 


Type checking 


V 




Prototype code generation 


V 




Test coverage 


V 


X 


Animation/Execution 


V 


V 


Modularity 


.V. 


V 


Refinement 


X 


V 


Proof 


X 


V 


Final Code generation 


X 


V 


Design documentation 


X 


V 


Version Cntl/Config Mgmt 


X 





Key 

^ — good support 
~ — some support 
X = no support 



interface. Its primary function is to provide distributed, multi-platform, inter- 
application message handling services involving message routing, storage, trans- 
formation and enrichment transparently to the applications. The component of 
the system which was developed formally monitors the status and contents of the 
message queues and allows individual messages to be updated when required. 

The project was undertaken in a conventional system software department 
with a development process certified as ISO9001 (Tickit) [44] compliant for its 
quality management system and operating at a point close to level 3 of the 
SEI Capability Maturity Model [40]. The development process adopted in this 
project was influenced by the desire to assess a variety of formal techniques 
covering as much of the development life cycle as possible, and the requirement 
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that the resulting code had to be closely integrated with code developed by other 
methods. 

The decision to employ a combination of VDM and B was motivated by the 
complementary facilities offered by the two toolkits and previous experience [22] 
which had shown the complementary strengths of the VDM and B methods. 
Three formal specifications were produced. The first, most abstract, specification 
was developed in VDM using VDM through Pictures. This was translated by 
hand into B Abstract Machine Notation, in order to conduct the first and second 
specification decomposition with the B-Toolkit, the result of which was then used 
to automatically generate C code. 

Three forms of analysis were undertaken for validation and verification. Ani- 
mation was used to validate the design during development, whereas post facto 
verification was undertaken using test cases and proof obligations which were 
generated from the specifications. 

Measurements relating to these activities were taken in order to compare the 
formal development process with the conventional one used in that department, 
and to compare the relative effectiveness of the various stages of the formal 
process. For the former, the results of a number of development projects, all 
producing sub-products with similar characteristics, were compared using the 
department’s existing programme of metrics. For the latter, faults were classified 
according to the development stage at which they were discovered and the stage 
at which they were introduced. 

3.3 SPECTRUM: A Step Toward a Unified Method 

Following MaFMeth, the SPECTRUM project also assessed the benefit of the 
combined use of VDM and B within a single development process. The project 
was a collaboration with four user partners in considering application domains: 

— GEC Marconi Avionics considering avionics systems, 

— Dassault Electronique considering terrestrial transport embedded control, 

— Space Software Italia considering satellite communication control, and 

— Commissariat a I’Energie Atomique considering nuclear plant control; 

and two technology suppliers: 

— the Institute of Applied Computer Science (lEAD) suppliers of the VDM- 
Tools, and 

— B-Core UK Ltd. suppliers of the B-Toolkit. 

The Rutherford Appleton Laboratory provided two roles: 

— Expertese in VDM and B and their combination, and 

— Neutral Project Coordination. 

The SPECTRUM project had three objectives: 

— to assess the cost-effectiveness for the user partners, of a development process 
employing an integration of the VDM and B formal technologies. 
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— to determine the technical feasibility of this integration; and 

— to investigate the commercial potential for the tool suppliers of the integra- 
tion of supported forms of VDM and B. 

The first objective was addressed through the development of user scenarios 
exploring the utility of the various functions available from the two toolkits and 
the advantages of employing them in terms of development cost and product 
quality. The scenarios were in the application domains of avionics systems and 
terrestrial transport embedded control. They were also reviewed from the per- 
spectives of satellite communication control and nuclear power plant control. 
Further details of the case studies can be found in [28,43,3]. 

The second objective was address through investigations into the feasibility 
of automating support for the translations between VDM and B. An approach to 
translation requiring the analysis of usage of types in the flat VDM specification 
in order to synthesise a useful decomposition into modules in B was proposed. 
Further details of the translations can be found in [38,21]. This work formed a 
key input to the VDM+B project described in Section 4.1. 

To address the third objective, the project assessed the commercial case for 
the provision of required functionality. It explored the overall cost-benefits of 
introducing the proposed method and the commercial viability of the proposed 
tool integration. It prepared a detailed market analysis and pricing policy for 
the combined tools. The results of this work are considered to be commercial in 
confidence. 



3.4 Evidence of Error Reduction 

The MaFMeth project undertook measurement and analysis of the total num- 
ber of faults introduced and found during development. Despite the use of two 
notations and the lack of integrated tool support, quantitative analysis of the 
faults found at unit test shows the approach to be very effective both in cost 
and quality. Figure 2 compares data from this project with three others under- 
taken by the user partner using structured design. The four projects were all 
developed in the same environment over a period of about 3 years and all used 
a similar development process apart from the technology involved. All projects 
were undertaken by engineers from the same development group and all were 
fragments of much larger developments. Similar testing procedures, based on 
manual identification of tests, were followed in each case. All, bar project 2, 
were new developments, whereas project 2 was a complex modification to an 
already heavily maintained system software component (hence, perhaps, the low 
productivity and quality of that development). None of the effort figures include 
the learning and technology transfer time which is inevitable in applying new 
approaches. 

The LOG figure (Lines of Code) is clearly central to the metrics and, for 
projects 1 to 3, refers to C language statements. For MaFMeth, in all 8000 lines 
of code were generated. However much of this arose from library components. 
The figure of 3500 lines of code is the developer’s estimate of the amount of code 
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Project 1 


Project 2 


Project 3 


MaFMeth 


Application 


System 

software 

utilities 


Transaction 

monitor 

modifications 


System 

software 

application 


System 

software 

middleware 


Approach 


Yourdon 


Yourdon 


VDM / Yourdon 


VDM / AMN 


Size (LOG) 


3000 


1100 


1300 


** 3500 


Effort (days) 


65 


80 


27 


43 


Effort / KLOC 


21.5 


72.5 


20.5 


12.5 


Faults at unit test 


27 


17 


7 


3 


Faults / KLOC 


9 


15.5 


5.5 


0.9 



** Normalised against amount of library code used. (Total was 8000). 

Fig. 2. Comparison of numbers of faults found 

that would have been produced to implement the same functionality without 
attempting any reuse. In fact, 1200 lines of implementation level B notation 
were produced to generate the final C code. 

The figures show that the MaFMeth project produced, on average, more 
code per day than any of the previous projects. Of course, this result must be 
tempered by the different application areas and the possible inaccuracy in the 
estimate of the equivalent number of lines of code. However, the improvement 
of nearly 100 % is noteworthy. 

Even more significant are the results concerning the number of faults at unit 
test. The unit testing used aimed at 100 % functional black box test coverage 
and 100 % branch level white box coverage. This was achieved by identifying test 
cases using techniques including equivalence partitioning, boundary value anal- 
ysis and a judicious amount of error guessing! The MaFMeth project produced 
less than 20 % of the faults of the next best project. 

No attempt was made to moderate the effectiveness of fault finding by the 
sev the faults found. Such an analysis should contribute to an estimate of the 
cost-effectiveness of each activity. 

Unfortunately, no figures for faults found during validation testing and cus- 
tomer use are available. 

3.5 Evidence of Cost Reduction 

Further evidence of cost reduction and comarison with costs for other develop- 
ment approaches was provided by GEC Marconi in SPECTRUM. Figure 3 shows 
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an abstracted graph plotting the expected cost of using different formal methods 
in three projects over recent years. 

The two horizontal lines represent the expected cost of employing their infor- 
mal development process for saftey critical and non-safety critical system com- 
ponents. 

The graph shows the falling cost of formal methods over recent years and in 
particular emphasises that the cost is now competative for safety critical systems 
and approaching viability for non-critical developments. 

The vertical bars associated with the data points represent the high margin 
for error in prodictions made from a few small projects. 



c 

o 

s 

T 



Critical 



Non-critical 



94 96 98 00 



TIME 



Fig. 3. Comparison of numbers of faults found 



3.6 Relative Benefit of Formal Activities 

The MafMeth project also undertook an analysis of the relative cost/benefits 
of each develepment activity. For these puposes the development process was 
divided into 13 activities, with varying degrees of tool support. These are 
depicted in Figure 4. The distribution of effort by activity is shown in Fig- 
ure 5. Some activities, for example the initial B specification and its animation, 
are grouped together as they were carried out simultaneously and no separate 
effort figures were kept. 
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A2 
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Fig. 4. Development activities identified in the MaFMeth project 



As might have been expected, the bulk of the design effort was in the main 
development in B. A substantial component was also expended on the early 
specifications in VDM. Very little effort was required during the testing stage. 

The faults found can be plotted against these efforts as a histogram with 
the width of columns representing the relative effort expended in each stage 
(Figure 6). However, when inspecting this it must be remembered that some 
stages involved development whereas others purely involved review. For stages 
Bl-2, one cannot assess how much effort was expended in finding faults through 
animation and how much on development, but if one assumes that approximately 
one half of this effort was spent on each activity, then the dotted line applies. 

Note how the most efficient fault finding occurs during test generation, ani- 
mation and proof. Although this can perhaps be attributed to the fact that most 
faults were found before testing occurred, the test generation and proof stages 
allow a different perspective on the specification and highlight problems which 
might otherwise be invisible to the developer. 



4 Ongoing Work and Conclusions 

This section outlines two current projects which are bringing together the three 
lines of research described above. The VDM+B project building on lines 1 and 2, 
the advancement and technology transfer of formal methods; and the Subsystems 
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Fig. 5. Effort expended by each project stage 




Fig. 6. Faults found per day by project stages 



project building on lines 1 and 3, the advancement of formal and industrial 
methods, it then draws some overall conclusions. 



4.1 The Integration of Two Industrially Relevant Formal Methods 
(VDM+B) 

The VDM+B project (1998-2001) is the third project on VDM and B inte- 
gration building on the results of the MaFMeth and SPECTRUM projects. In 
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recognition of the pragmatic nature of the earlier work, the goal of this project 
is establish a formal foundation of heterogeneous development in VDM and B. 

An obvious point of concern is the foundational differences in the languages. 
VDM is based on the 3- valued Logic of Partial Functions (LPF) whereas B is 
based on classical First Order Predicate Calculus. Work on developing proof 
support for VDM [2] has shown that in a framework with dependent types, such 
as PVS [39], most specifications which employ partial functions for their expres- 
sivity can be directly translated to functions which are total over a subdomain. 
The remaining uses of partiality represent a particular form of lazy concurrent 
disjunction which is built into LPF but not available in B. 

Although the two notations are founded on a different logic, the proof rules 
in the B-Toolkit do have a fiavour more akin to those of VDM where typing 
hypotheses are used as guards to the expression construct introduction and 
elimination rules. In the absence of a standard form for proofs that would enable 
proofs developed in one system to be checked with another, it is important for the 
certification of formal developments to be able to “second source” the theorem 
proving capability. This will allow proof support to be developed in a number 
of systems and contribute to the certification of theorem proving capability for 
use in safety critical systems. Current support for the languages has not been 
certified in this way. 

A further area of difference is higher level modular structuring. Several 
approaches to modularisastion exist for VDM. The VDM standard language, 
VDM-SL, has no structuring mechanism although a form of modularisation is 
given as an informative annex, the IFAD VDM Toolbox supports a simple form 
of modules and VDM++ [35] has an object-oriented notion of structuring based 
on classes. On the other hand, the ability to incrementally present a specification 
is central to B where implementations can be constructed in a structured way 
by composing implementations of separate components. 

Thus transformations between structured specifications in the two formalisms 
should, in some sense, preserve the locality of information. For example, in mov- 
ing from a single module of VDM where the structure is based around a hier- 
archical definition of record types, we would hope to achieve a B specification 
which used machines to mirror the structure of the records. The danger is that 
in “coding up” such a complex refinement into the translation we risk the sound- 
ness of the translation. One possible approach [38] is for the translation to result 
in two levels of B specification and a refinement between them. In this way the 
translation is kept simple, whilst the complexity of the refinement is localised 
within the one formalism and hence more amenable to verification. 

4.2 Objects, Associations and Subsystems: A Hierarchical 
Approach to Encapsulation 

The Subsystems project (1999-2002) arose from observations made in the For- 
mal Underpinning of Object Technology project. That project observed that 
although subtyping and inheritance provide a hierarchical means of classification 
of objects, the class-instance paradigm is essentially fiat and does not directly 
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support the nesting of objects within objects. This led us to propose a notion 
of subsystem which generalises the class-instance-based concept of object, yield- 
ing an approach to system-specification employing object-like encapsulation in 
a nested hierarchy of components [16]. 

The strength of these subsystems lies in generalising key features of the suc- 
cess of object orientation. Objects provide a simple yet powerful basis for modu- 
larity through encapsulation. Aggregation of attributes in objects, and objects in 
associations, provides a basis for data-encapsulation; object identifiers globally 
identify instances and give an implicit indirection which distinguishes attributes 
which are themselves objects from attributes which are pure values. Objects can 
also provide a basis for establishing non-interference in concurrent implementa- 
tions. On this basis, it seems that the 00 approach would benefit from an old 
idea: hierarchical structuring. 

In [16], we observed that the compositional interpretation of object-oriented 
designs requires the identification of theories intermediate between those of the 
constituent classes and associations and that of the entire system; and, how many 
constructions are naturally interpreted in theories corresponding to identified 
parts of the overall system. This project will investigate subsystems as first- 
class objects in 00 system description achieving a hierarchical form of object- 
orientation. 



4.3 Conclusions 

We have presented some key results of two projects measuring the costs and 
benefits of the formal approach. This evidence indicates that formal methods 
are currently cost-neutral for safety critical systems and deliver higher quality 
than alternatives. The evidence contributes to the case upon which the adoption 
of formal techniques should spread beyond those applications domains where 
formal techniques and mandated or highly recommended such as defense and 
transport (c.f. UK Def-Stan 00-55 and French RATE requirements) to other 
safety and financially critical applications. 

We have indicated the falling cost of formal methods which are now becom- 
ing competative with other methods used in non-critical systems development. 
Further cost reduction, in particular below the cost of systems development for 
non-critical systems, will increase the market enormously. 

However, conclusions drawn from these projects should be moderated by the 
small size the developments and the fact that the development teams were also 
small and staffed by self- selected individuals who, being keen to make a success 
of the experiments, were perhaps better motivated than average. It would not 
be wise therefore to extrapolate these results to larger projects. 

Despite these qualifications, there is some evidence in these results in favour 
of formal methods. Faults are inevitable and their detection is aided by formal- 
isation. It seems that any analysis, whether animation, proof obligation genera- 
tion, proof, or testing, is worthwhile. These activities are only possible once the 
objects involved are formalised. 
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The three lines of research described provide three-legs of support for indus- 
trial strength formal methods: improved methods and support for them; evidence 
of the costs and benefits of their use; and formal underpinning of established 
industrial methods. 

This work has contributed to an accumulation of evidence for the benefits of 
formal methods. It has raised awareness of the need to gather such evidence for 
larger projects and has demonstrated some techniques for doing so. 

References 

1. J.-R. Abrial, The B-Book: Assigning Programs to Meaning^ Cambridge University 
Press, ISBN 0-521-49619, 1996. 164, 167 

2. S. Agerholm, Translating Specifications in VDM-SL into PVS, in Higher Order 
Logic Theorem and Its Applications: 9^^ International, LNCS 1125, Springer Ver- 
lag, 1996. 176 

3. S. Agerholm, P-J. Lecoeur and E. Reichert., Formal Specification and Validation at 
Work: A Case Study using VDM-SL Proceedings of Second Workshop on Formal 
Methods in Software Practice, Florida, ACM, March 1998. 171 

4. Atelier B, Steria Mediterranee, The B Page URL, 
http://www.atelierb.societe.com/PAGE_B/uk/bhomepg.htm 168 

5. The B method (virtual library page) http:/ / www.comlab.ox. ac.uk/ archive/for mal- 
methods/b.html. 168 

6. B-Core (UK) Ltd, B-Toolkit User’s Manual, Version 3.0, 1996. For details, con- 
tact Ib Sorensen, B Core (UK) Ltd, Magdalen Centre, Robert Robinson Avenue, 
The Oxford Science Park, Oxford 0X4 4GA. Tel: +44 865 784520. Email: 
Ib.Sorensen@comlab.ox.ac.uk, WWW: http://www.b-core.com/ 168 

7. J. Bicarregui (Ed.) Proof in VDM: Case Studies, Springer-Verlag, FACIT series, 
1998. 165 

8. J. C. Bicarregui. Intra- Modular Structuring in Model- Oriented Specification: 
Expressing Non-Interference with Read and Write Frames. Ph.D. Thesis, Univer- 
sity of Manchester (UMCS-95-10-1). 165 

9. J. C. Bicarregui. Operation Semantics with Read and Write Frames. Proceedings 
of the 6*^ Refinement Workshop, David Till (Ed.), Springer Verlag. 165 

10. Algorithm Refinement with Read and Write Frames. J.C. Bicarregui. Proceeding 
of Formal Methods Europe ’93, Woodcock and Larsen (Eds), LNCS 670, Springer- 
Verlag. 165 

11. J. C. Bicarregui et al.. Formal Methods Into Practice: case studies in the application 
of the B Method. I.E.E. proceedings software engineering, Vol. 144, No. 2, 
1997. 165 

12. J. C. Bicarregui, J. Dick, B. Matthews, E. Woods. Making the most of formal 
specification through Animation, Testing and Proof. Science of Computer Pro- 
gramming, Vol. 29. (1997), Elsevier Science. 166, 168 

13. J. C. Bicarregui, J. Dick, E. Woods. Quantitative Analysis of an Application of 
Formal Methods. Proceeding of FME’96, Third International Symposium of Formal 
Methods Europe, LNCS, Springer-Verlag. 166, 168 

14. J. C. Bicarregui, J. Dick, E. Woods. Supporting the length of formal development: 
from diagrams to VDM to B to C. Proceedings of 7^^ International Conference on: 
“Putting into practice method and tools for information system design”, Nantes 
(France), October 1995, lUT de Nantes, H. Habrias (Ed.) ISBN 2-906082-19-8. 
166 



Exploiting Formality in Software Engineering 179 



15. Proof in VDM-A Practitioner’s Guide. J. C. Bicarregui, J.S. Fitzgerald, 
P. A. Lindsay, R. Moore and B. Ritchie. ISBN 3-540-19813-X, FACIT, Springer- 
Verlag 165, 167 

16. J. C. Bicarregui, K. C. Lano and T. S. E. Maibaum, Objects, Associations and 
Subsystems: a hierarchical approach to encapsulation, Proceedings of ECOOP’97, 
11^^ European Conference on Object-Oriented Programming, Jyvaskyla, Finland, 
June 1997. 166, 177, 177 

17. Towards a Compositional Interpretation of Object Diagrams. J. C. Bicarregui, 
K. C. Lano and T. S. E. Maibaum, in “Algorithmic Languages and Calculi”, 
Proceedings of IFIP TC2 working conference, Strassbourg, Feb. 1997, Bird and 
Meertens (Eds.), Chapman and Hall. 0-412-82050-1, September 1997. 166 

18. J. C. Bicarregui, B. M. Matthews, The specification and proof of an EXPRESS to 
SQL “compiler” , in Proof in VDM: Case Studies, J. C. Bicarregui (Ed.), FACIT 
series, Springer-Verlag, 1998. 166 

19. J. C. Bicarregui, B. Matthews. Eormal Methods in Practice: a comparison of 
two support systems for proof. SOFSEM’95: Theory and Practice of Informatics, 
Bartosek et al. (Eds.), LNCS 1012, Springer-Verlag. 165 

20. J. C. Bicarregui, B. M. Matthews Eormal perspectives on an Object-Based Mod- 
elling Language Proceedings of the 6^^ EXPRESS User Group Conference, Toronto, 
Canada. IEEE Computer Society, ISBN 0-8186-8641-3, 1996. 166 

21. Investigating the integration of two formal methods, Juan Bicarregui, Brian 
Matthews, Brian Ritchie, and Sten Agerholm, Eormal Aspects of Computing, 
Vol. 10. pp. 532-549, Springer-Verlag, December 1998. 171 

22. Invariants, Erames and Postconditions: a comparison of the VDM and B nota- 
tions. J. C. Bicarregui and B. Ritchie. Proceeding of Formal Methods Europe ’93, 
Woodcock and Larsen (Eds.), LNCS 670, Springer-Verlag. b. 1995. 165, 168, 170 

23. Reasoning about VDM Developments using the VDM Support Tool in Mural. 
J. C. Bicarregui and B. Ritchie. Proceeding of VDM 91, Prehn and Toetenel (Eds), 
LNCS 552, Springer-Verlag. 165 

24. Supporting Eormal Software Development and The Mural VDM Support Tool. 
J. C. Bicarregui and B. Ritchie, in “Mural, A Formal Development Support Sys- 
tem.” Jones, C. B. et al. Springer-Verlag, ISBN 3-540-19651-X. 165 

25. Providing Support for the Eormal Development of Software. J. C. Bicarregui and 
B. Ritchie. Proceedings of the First International Conference on Systems Develop- 
ment Environments and Factories. Madhavji, Schafer and Weber (Eds.), Pitman. 
165 

26. Experiences with Proof in a Eormal Development, D. Clutterbuck, J. C. Bicarregui 
and B. M. Matthews, Proceeding of 1st International Conference on B, Institut de 
Recherche en Informatique de Nantes, France, November 1996. 165 

27. A. J. J. Dick and J. Loubersac. A Visual Approach to VDM: Entity -Structure 
Diagrams, Technical Report DE/DRPA/91001, Bull, 68, Route de Versailles, 78430 
Louveciennes (France), 1991. 167 

28. J. Draper (Ed.) Industrial benefits of the SPECTRUM approach SPECTRUM 
Project External Deliverable 1.3, 1997. See 
http://www.itd.clrc.ac.uk/Activity/SPECTRUM. 171 

29. D. A. Duce, F. Paterno, Lotos Description of GKS-R Eunctionality, Formal Meth- 
ods in Computer Graphics Workshop Eurographics Association, 1991. 165 

30. R. Elmstrpm, P. G. Larsen, and P. B. Lassen, The lEAD VDM-SL Toolbox: A 
Practical Approach to Eormal Specifications, ACM Sigplan Notices 29(9), pp. 77- 
80, 1994. For more information see http://www.ifad.dk/ 167 



180 Juan C. Bicarregui 



31. J. S. Fitzgerald, Modularity in Model- Oriented Speeifieation and its Interaetion 
with Formal Reasoning. Ph.D. Thesis, University of Manchester, 1991. 167 

32. ISO Information Teehnology - Programming Languages - Vienna Development 
Method- Speeifieation Language. Part 1: Base Language. ISO 13817-1, 1996. 167 

33. C. B. Jones, Systematie Software Development using VDM. Prentice Hall, second 
edition, 1990. 164, 167, 167 

34. C. B. Jones and C.A. Middelburg, A Typed Logie of Partial Funetions Reeon- 
strueted Classieally, Acta Informatica, 1994. 167 

35. K. Lano, S. Goldsack, Integrated Formal and Object-oriented Methods: The 
VDM++ Approach’, 2^*^ Methods Integration Workshop, Leeds Metropolitan Uni- 
versity, April 1996; 176 

36. P. G. Larsen and W. Pawlowski, The formal Semanties of ISO VDM-SL, Gomputer 
Standards and Interfaces, Volume 17, numbers 5-6, 1995. 167 

37. B. M. Matthews, MBRILL: An Pquational Reasoning System in Standard ML, 
5^^ Int. Gonf. on Rewriting Techniques and Applications p. 414-445 G. Kirchner 
(Eds.), LNGS 690, Springer- Verlag, 1993. 165 

38. Synthesising strueture from flat speeifieations, B. Matthews, B. Ritchie, and 
J. Bicarregui, Proc. of the 2^*^ International B Gonference, Montpellier, France, 
April 22-24, 1998. Springer Verlag, LNGS 1393. 171, 176 

39. S. Owre et al. PVS: Gombining Specification, Proof Ghecking, and Model Ghecking, 
Gomputer- Aided Verification, GAV’96, Rajeev et al. (Eds.) Springer- Verlag LNGS 
1102, 1996. 176 

40. M. G. Paulk, W. Gurtis, M. B. Ghrissis, G. V. Weber, Capability Maturity Model for 
Software, Version 1.1, Garnegie Mellon University Software Engineering Institute 
Technical Report, GME/SEI-93-TR-24, Eebruary 1993. 169 

41. Pxperienees in Lfsing the Abstraet Maehine Notation in a GKS Case Study. 
B. Ritchie, J. G. Bicarregui and H. Haughton. Proceeding of Eormal Methods 
Europe ’94, Naftalin, Denvir and Bertran (Eds.), LNGS 873, Springer- Verlag. 165 

42. J. M. Spivey. The Z Notation, A Referenee Manual. Prentice Hall, ISBN 0-13- 
983768-X, 1987. 164 

43. H. Treharne, J. Draper and S. Schneider, Test Case Preparation Using a Prototype, 
Proc. of 2^*^ Int. B Gonf. B’98: Recent Advances in the Development and use of 
the B Method, LNGS 1393, Springer Verlag, April 1998. 171 

44. U. K. Department of Trade and Industry, TiekIT: Cuide to Software Quality Man- 
agement, System Construetion and Certifieation using IS09001/PN29001/BS5750 
Part 1, TickIT Project Office, 68 Newman Street, London, WlA 4SE, UK, Eebru- 
ary 1992. 169 

45. The VDM examples repository: http://www.ifad.dk/examples/examples.html. 
167 



Biomolecular Computing and Programming 

(Extended Abstract) 



Max H. Garzon, Russell J. Deaton, and The Molecular Computing Group 

The University of Memphis, Memphis TN, 38152 
www.msci .memphis . edu/~garzonm/mcg.litml 



Abstract. Molecular computing is a discipline that aims at harnessing 
individual molecules at nanoscales to perform computations. The best 
studied molecules for this purpose to date have been DNA and bacteri- 
orhodopsin. Biomolecular computing allows one to realistically entertain, 
for the first time in history, the possibility of exploiting the massive par- 
allelism at nanoscales for computational power. This talk will discuss 
major achievements to date, both experimental and theoretical, as well 
as challenges and major potential advances in the immediate future. 



Early ideas of molecular computing attempted to emulate conventional elec- 
tronic implementations in other media, e.g., implementing Boolean gates in a 
variety of ways. A fundamental breakthrough characteristic of a new era was 
made by Adleman’s 1994 paper in Science^ where he reports an experiment 
performed with molecules of fundamental importance for life, DNA (deoxyri- 
bonucleic acid) molecules, to solve a computational problem known to be diffi- 
cult for ordinary computers, such as the HAMILTONIAN PATH PROBLEM (hpp). 
There is good evidence, however, that several million years earlier and unknown 
to all of us, the ciliated protozoa oxytricha nova and oxytricha trifallax of the 
genus oxytricha solved a problem similar to hpp while unscrambling genes as 
part of their reproductive cycle. It has been argued that other molecules such 
as proteins and artificially engineered rybozymes may serve biological and com- 
putational functions much better. An older alternative to DNA molecules that 
support optical computing is the protein bacteriorhodopsin^ which contains the 
light sensitive rhodopsin present in vertebrate retinas. The light activated state 
switching property has been used in combination with lasers to create a storage 
medium for optical computer memories that is almost in the commercial stage 
now. The possibility exists that it might become a core memory for a molecu- 
lar computer. Although certainly involving amino acids at the protein-binding 
sites, this type of computation is more passive than Adleman’s type. ‘Molecular 
computing’ thus essentially means today DNA- and RNA-based computing. 

Adleman’s seminal idea insight was to carefully arrange a set of DNA 
molecules so that the chemistry that they naturally follow would perform the 
brunt of the computational process. The key operations in this chemistry are 
sticking operations that allow the basic nucleotides of nucleic acids to form larger 
structures through the processes of ligation and hybridization. The first DNA- 
based molecular computation is summarized in Fig. 1. Specifically, Adleman 
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assigned (well chosen) unique single-stranded molecules to represent the ver- 
tices, used Watson-Crick complements of the corresponding halves to represent 
edges joining two vertices, and synthesized a picomol of each of the 21 resulting 
molecules for the graph in Fig. 1(a). Taking advantage of the fact that molec- 
ular biologists have developed an impressive array of technology to manipulate 
DNA, he designed a molecular protocol (one would say algorithm in computer 
science) that enabled the molecules to stick together in essentially all possible 
ways. In the situation illustrated in Fig. 1(b), the edge molecules were to splinter 
nearby vertex molecules to construct longer and longer molecules representing 
paths in the original graph. If there exists a Hamiltonian path called for in the 
problem specification, one representative molecule would thus be created by the 
chemistry on its way to equilibrium. Using more of the same biotechnology he 
could then determine, as illustrated in Fig. 1(c), the presence or absence of the 
molecule in the final test tube and respond accordingly to the original problem. 

Various factors have made molecular computing possible. Critical among 
them is the possibility of synthesizing biomolecules at low costs and manipulating 
them with relative ease despite their nanometric size. These complex molecules 
are composed of basic blocks called nucleotides, nucleic acid bases A (adenine)^ 
G (guanine), C (cytosine), T (thymine) (or U (uracil) in RNA), that bind to form 
chains called oligonucleotides, or n— mers, according to the Watson-Crick (herein 
abbreviated as WC) complement condition, A = T and C = G, and vice versa. 
Each molecule has a polarity (sense of orientation) from a so-called 5'-end to a 
3'-end or vice versa. Their physical implementation is therefore relatively simple 
compared to the demanding and costly fabrication processes used in VLSI. Key 
manipulations include gel Electrophoresis (a powerful microscope that permits 
us to see molecules -or rather populations thereof- with the naked eye), cleaving 
by restriction enzymes (for cut-and-paste) , copying by PCR - Polymerase Chain 
Reaction, and complex combinations of these basic building blocks. 

The basic methodology in molecular computing to solve computational prob- 
lems consists of three basic steps: an encoding that maps the problem onto 
DNA strands, hybridization/ligation that performs the basic core processing 
and extraction that makes the results visible to the naked eye, as illustrated 
in Fig. 1. Examples are parallel overlap assembly to generate potential solu- 
tions to problems before filtering them according to constraints at hand, (e.g., 
Hamiltonicity), massively paralellel boolean- circuit evaluation, whiplash PCR to 
implement state transitions in molecules. Potential applications include DNA 
fingerprinting, DNA population screening, and DNA sequencing, A ‘killer appli- 
cation’ would suit well the nature of biomolecules (e.g., bypass digitization), 
beat current and perhaps even future solid-state electronics, and would estab- 
lish beyond the shadow of a doubt the power of the new computational paradigm, 
but remains to be unearthed. 

The grand challenges for molecular computing are to improve its reliability, 
efficiency, and scalability. The reliability of a protocol, i.e., a DNA computa- 
tion, is the degree of confidence that a lab experiment provides a true answer 
to the given problem. The efficiency of the protocol refers to the intended and 
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instance; (b) computing reactions; (c) extraction. 
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effective use of the molecules that intervene in it. The scalability of a lab exper- 
iment is the effective reproducibility of the experiment with longer molecules 
that can encode larger problem instances while still obtaining equally realiable 
results under comparable efficiency. These three are distinct but clearly inter- 
related problems. Biologists have not really faced these problems in their work 
because, in that field, the definition of success is different than in computer sci- 
ence. (When a biologist claims that she has cloned an organism, for example, 
the contention is that one experiment was successful, regardless of how many 
were previously not, or whether only one clone was actually produced. This kind 
of success rate is simply unacceptableis in computer science.) Research on these 
problems in molecular computing has just begun. Their difficulty is rooted in our 
relatively poor ability to control the physical chemistry involved in the context 
of information processing, despite the impressive progress in biotechnology that 
has made it thinkable. The computation and extraction phases therefore rely on 
cooperative phenomena that can only be observed as ensemble statistieal pro- 
eesses involving a great number of individual molecules. Success in these areas is 
critical for the actual construction of a molecular computing and is likely to have 
a feedback effect on the notions of efficiency and scalability in biology. Success 
will likely require self-controlled and autonomous protocols that would eliminate 
human intervention, and so reveal more about the true power of molecular com- 
puting. Consequently, traditional measures of algorithmic efficiency will require 
re-examination for measuring molecular computing efficiency. 

In summary, important events have taken place in the field of biomolecu- 
lar computing in the last five years. There indeed remain enormous scientific, 
engineering, and technological challenges to bring this paradigm to full fruition. 
Nonetheless, the potential gains and advantages of computing with biomolecules 
are likely to make it a fascinating and intriguing competitive player in the land- 
scape of practical computing in a not too distant future. 

The full version of this abstract can be found in the survey by the authors [84] . 
The reference list from this survey are reproduced below for the convenience of 
the reader. Up-to-date information can be found in several frequently updated 
web pages that contain more references and useful links to molecular computing. 
They include: 

http : / / WWW. msci . memphis . edu / ~ garzonm / meg . ht ml , 

http:/ / seemanlab4.chem.nyu.edu/, and 

http : / / WWW . wi . LeidenU niv . nl/ ~ j dassen / dna . ht ml . 
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Abstract. Changeability (also called evolvability) is an essential prop- 
erty of software. Software change is the foundation for both new software 
development and legacy software maintenance, therefore a better under- 
standing of software change is an important software engineering issue. 
This paper covers selected topics related to software change, including 
minicycle of change, partitioned annotations, and change propagation, 
and gives a brief overview of the field. 



1 Introduction 

A brief look at computer science textbooks and journals reveals that most of the 
computer science research is oriented towards the development of new software. 
However over the years a huge amount of software has been accumulated and is 
in daily use. In [9], the size of the software installed in the U.S.A. is estimated 
to be 36,000,000 applications of a total size of 1,700,000,000 function points, 
equivalent to approximately 170,000,000,000 lines of code. This software has 
become an integral part of the economy and plays an important role in work 
productivity. It would be a great omission if the software research ignored this 
fact and concentrated only on the development of new software. In order to 
account for all software, both new and old, researchers should focus on the 
essential properties of software. 

The list of essential properties of software appeared in [3] , and includes invis- 
ibility, complexity, interoperability, and changeability (also called evolvability). 
The experience of every software developer or maintainer confirms that software 
changes continually, both during development and during maintenance. 

Because of the importance of change, it is a significant research goal to make 
software changes easy, safe, and inexpensive. One of the solutions is to antic- 
ipate changes and structure the software in such a way that the changes will 
be localized inside software components [12]. When a change is localized within 
a component, it is easier, safer, and less expensive. However, more recent research 
indicates that not all changes in software can be anticipated. In the case study 
of [6] it was reported that approximately 70 % of the requirements were predicted 
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in advance and the remaining requirements were discovered during development. 
Hence the software was already undergoing changes during development, because 
the requirements were not accurately predicted. We believe that massive changes 
triggered by company mergers, introduction of Euro, emergence of graphical user 
interfaces and Internet, are examples of changes that could not be predicted 
even a few years ago. Moreover the current trend towards increasing complexity 
and heterogeneity of applications will make accurate prediction of changes even 
harder. It is likely that current applications will be exposed to many unantic- 
ipated changes during their lifetime. Therefore the support for unanticipated 
changes is an important research goal. 

The paper is organized in the following way: Section 2 presents a minicycle 
of change. Section 3 presents a technique for program comprehension and redoc- 
umentation. Section 4 explains a formalism for change propagation. Section 5 
gives a brief overview of the other work in program change, related to the topics 
of this paper. Section 6 contains conclusions and a discussion of future research. 

2 Minicycle of Change 

In this paper, we deal with selected aspects of software change. In order to place 
these aspects in the proper context, let us first introduce the so-called software 
minicycle, around which this paper is organized. 

Software change is a process consisting of several phases: 

— Request for change 

— Planning phase: 

• Program comprehension 

• Change impact analysis 

— Change implementation: 

• Restructuring for change 

• Change propagation 

— Verification 

— Redocumentation 

A request for change is the specification of the change. It may be a bug report, 
requesting correction of a fault in software, or it may be a request for the intro- 
duction of a new functionality into the software. It may originate either from 
a user or a developer of software. If the change request is accepted, the change 
enters the planning phase. 

In the planning phase, the existing software must be comprehended first, 
before the change can be implemented. Depending on several factors, the process 
of program comprehension may be easy or difficult, see Section 3. 

The planning phase also contains change impact analysis, where the pro- 
grammers assess the extent and the difficulty of the change. Several techniques 
and case studies of change impact analysis were published in the literature, see 
Section 5. 
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If the planning phase determines that the change in software is feasible, the 
next phase is the implementation of the change. During the change implemen- 
tation, the software may be first restructured to make the change easier. The 
change propagation is described in Section 4. 

After the change was implemented, its correctness is verified. After the veri- 
fication, the new functionality and the new structure of the software is redocu- 
mented in the redocumentation phase. 

It is understood that in the actual change process, the phases may overlap, be 
repeated, or additional phases may appear. For example if verification discovers 
problems, the minicycle may reenter one of the earlier phases. The actual phases 
of the minicycle are presented in the literature in several different forms [24], 
but they all capture the same basic idea. This paper avoids the complexities 
of the actual minicycle and presents this simplified version as a context for the 
techniques described in the following sections. 

3 Partitioned Annotations 

The first task of the programmer is to understand the program. Program under- 
standing (or comprehension) is a subject of extensive research, see for exam- 
ple [13]. One of the research topics is strategies used in program comprehension. 

Two specific strategies occupy a prominent role: Top-down and bottom- 
up [4] . In the top-down strategy, the programmer approaches the task of program 
comprehension by making hypotheses about the program. The programmer first 
accepts the fundamental hypotheses and then refines them recursively by sub- 
sidiary hypotheses. The hypotheses are verified by a search for evidence (called 
“beacons”) in the code. If the evidence support the hypotheses, they are accepted 
as true. If the evidence contradicts the hypotheses, they are rejected and new 
ones must be formed. 

Bottom-up strategy is based on chunking, which consists of two steps: aggre- 
gation and abstraction. During aggregation, the programmer scans the code and 
groups together constructs of the program. During the step of abstraction, the 
programmer recognizes these aggregates as implementations of familiar concepts. 
For example, several lines of code may be an implementation of a stack or an 
implementation of a sorting algorithm, etc. The chunks can contain other chunks 
as parts, and the chunking process recursively finds larger and larger chunks until 
the whole program is understood. The programmers usually employ a combina- 
tion strategy with elements of both the top-down and the bottom- up process [22]. 

A part of program comprehension strategy is the fact that the programmer 
does not need to comprehend everything about the program in order to be able 
to make a change [10]. Usually it is sufficient to comprehend the relevant parts 
or aspects of the program. The information needs vary from task to task, and 
while there is a need to understand some parts of the program in great detail, 
other parts can be understood only roughly and still other parts scarcely at all. 

Program comprehension is an expensive part of software change, absorbing 
on average more than half of all maintenance costs [7]. Yet with current practice. 
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valuable knowledge gained during the change is not recorded and is in a danger 
of being lost when the change is completed and the programmer turns his/her 
attention to the next task. Partitioned annotations of software (PAS) [16] are 
a tool that supports the process of program comprehension by providing a note- 
book where the programmer can record the comprehension of the program. 

PAS are structured as a matrix where one coordinate is the components of the 
program and the other coordinate is the partitions. The partitions are selected 
based on the needs of the project, and each partition contains a description of 
the software components from a specific point of view. For example, there can 
be a domain partition that describes the domain concepts that the particular 
component implements. Other partitions may describe the representation of con- 
cepts by data structures, archive the history of the component, document the 
quality of the testing, record unconfirmed hypotheses of a specific programmer 
about the component, etc. The selection of partitions is based on the needs of 
the project. 

PAS allow the programmer to read only those partitions that are related 
to his/her specific information needs. If a component needs to be understood 
in detail, the programmer is likely to read all partitions for the component. If 
on the other hand the component needs to be understood only partially, the 
programmer can read only the partitions relevant to his/her task. There is no 
need to limit the number of partitions and amount of information stored, because 
unneeded or mistrusted partitions will simply not be used. 




Fig. 1. Classes and interactions of Calendar program 



As a small example, consider a calendar program that maintains appoint- 
ments for a single user, and allows the user to enter, delete, and search appoint- 
ments. The components and interactions of the program are in Figure 1, and the 
partitioned annotations are in Tables 1 and 2. The components of the program 
are classes and the partitions are the domain, representation (i.e. data structures 
of the class and their meaning), and interactions of the class with other classes. 

For large system, PAS are implemented as hypertext [19]. In that case, stan- 
dard Internet browsers like Netscape or Explorer can be used to browse through 
the partitions. In order to guarantee the uniformity of the annotations, a PAS 
generator called HMS was implemented [19] that parses the code and generates 
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hypertext skeletons for the annotations of the program. The skeletons are then 
filled in by hand with the actual information. 

A case study of the use of PAS in the maintenance of one million lines of 
C++ code was reported in [19]. In that project, PAS were used for incremen- 
tal program redocumentation in the following process: Whenever a maintenance 
programmer completes a change, he/she produces a “release note” that summa- 
rizes the knowledge of the program gained during the change. A specially trained 
software documenter uses this release note and records the new knowledge in the 
appropriate partitions of the PAS. Over the period of 15 months, approximately 
40 % of the code was redocumented and the documentation is concentrated in 
the most frequently maintained parts of the code. The extra cost involved in this 
redocumentation represents approximately a 15 % overhead, based on the time 
sheets produced for the customer. 

Table 1. Partitioned annotation of Calendar program, part 1 



user 

Domain: 

Class “user” implements user interface, supports insertion, deletion, and search 
for appointments 
Representation: 

N is user’s name 
C is instance of calendar 
Interaetions: 

Class “user” depends on “eventList” 

eventList 

Domain: 

Class “eventList” maintains the list of appointments, supporting addition, dele- 
tion and search 
Representation: 

L is list of pointers to “eventAbstr” and contains a list of appointments 
Interaetions: 

Class “eventList” is a part of Composite pattern [8] , inherits from “eventAbstr” 
and is used by class “user” 

eventAbstr 

Domain: 

Class “eventAbstr” contains abstract appointment, consisting of beginning, end, 
and title 
Representation: 

Beg is beginning of appointment 
End is end of appointment 
Title is title of appointment 
Interaetions: 

Class “eventAbstr” is a part of Composite pattern 



For this extra expense, the following benefits were acquired: Any programmer 
can maintain the redocumented code without incurring the extra cost of new 
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comprehension. This simplifies the scheduling of programmers for maintenance 
tasks, because the first available programmer can be assigned to do the next 
maintenance task. Also when the programmers leave the project, their knowledge 
of the code is preserved in the form of PAS annotations. The management and 
customers of the project consider these benefits to be worth of the extra expense. 



Table 2. Partitioned annotation of Calendar program, part 2 



event 

Domain: 

Class “event” implements a simple appointment 
Representation: 

Interaetions: 

Class “event” inherits from “event Abstr” , part of Composite pattern 

time 

Domain: 

Class “time” implements date, hour, and minute 
Representation: 

Ho contains the hour 
Mi contains the minute 
Interaetions: 

Class “time” inherits from “date” 

date 

Domain: 

Class “date” implements date in the form mm/dd/yyyy 
Representation: 

Mm is 2-digit month, dd is 2-digit day, yyyy is 4-digit year 
Interaetions: 

Class “date” is the base class for class “time” 



4 Change Propagation 

After the preliminary phases establish feasibility of a change, the change is imple- 
mented. The change implementation consists of several steps, each visiting one 
specific software component. If the visited component is modified, it may no 
longer fit with the other components because it may no longer properly interact 
with them. In that case secondary changes must be made in neighboring compo- 
nents, which may trigger additional changes, etc. This process is called change 
propagation. Although each change starts and ends with consistent software, 
during the change propagation the software is often inconsistent. 

This process is modeled by the graph rewriting of [14,11]. The basic notion 
is an evolving interoperation graph (eig), defined in the following way: Let C be 
a set of components of the program. An interoperation or interaetion between 
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two components 6, c G C, 6 7^ c is formally represented as an unordered couple 
{6, c}. Interoperation graph G is a set of interoperations. Examples of interop- 
erations are function calls, data flows, use of shared resources, etc. The changes 
propagate through interoperations from one component to the next. 

For interoperation graph G, interoperation {6, c} G G is marked if there exists 
an ordered couple (6, c) called mark. Existence of a mark intuitively means that 
component b was changed or inspected in the past, and component c will be 
changed or inspected in the future. Marked interaction can be inconsistent, i.e. 
there could be a conflict in the interaction of the two components. An example 
of such a conflict is a different number of arguments in a function call than in 
the function declaration. If (6, c) G E*, then c is called a marked component. An 
evolving interoperation graph (eig) is a set E of interoperations and marks such 
that (6, c) E E implies {6, c} G E. Eig E is unmarked if there are no marks in E 
and marked if there are marks in E. 

For eig E, define the following sets: 



G{b) = {{b,c}\{b,c} e E} 
M{b) = {{c,b)\{c,b) e E} 
M(b) = {{b,c)\{b,c) G E} 
E{b) = G{b)UM{b)UM{b) 



(interoperations of b) 
(incoming marks to b) 
(outgoing marks from b) 
(evolving neighborhood of b) 



A visit to a component is the replacement of that component and its neigh- 
borhood by an updated one. Formally, let b and b' be a component before and 
after the visit, E(b) and E'(b') be the evolving neighborhood before and after 
the visit, and E and E' be an eig before and after the visit, respectively. Then 
a visit is a couple of eigs (E, E') such that E' = {E — E{b)) U E'{b'). A scenario 
is a sequence of evolving dependency graphs Ei , E2 , . . . E^ that starts and ends 
with unmarked graphs, i.e. both E\ and E^ are unmarked. If there are no back- 
tracks, then for each step i, (E^,Ei+i) is a visit. If there are backtracks in the 
scenario, then either (E^, E^+i) is a visit, or for some /c, 0 < /c < i, E^+i = E^. 

The strategy of a specific scenario is a set of constraints imposed on the visits. 
This paper considers finality, accuracy, and strictness. If visits are final, then we 
assume that after visiting component b and then its neighbor component c, com- 
ponent b does not need another immediate visit. A non- final strategy means that 
there may be an immediate need to revisit b again. Formally, this is expressed 
in the following way: 

- For a non-final change, M^{b') = 0 and M'{b') C {{b' , c)\{b' , c} G E'{b')}. 

— For a final change, M^{b') = 0 and M'(b') C {(6', c)|{6', c} G E'(6')} — 

W,d)\{d,b)eM{b)}- 

The accuracy is dependent on the depth of program analysis. This paper assumes 
that the program analysis is limited to a static analysis, that extracts only 
the components and the existence or nonexistence of interactions among them, 
without any additional information. In this situation, we will distinguish between 
two possibilities: Either the change propagates through the component to all 
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neighbors, or none at all. The distinction cannot be done automatically by the 
analysis tool, but must be made by the programmer, based on his/her knowledge 
of semantics. In the first case, for non-final change M'(h') = {(6', c) |{6', c} G 
G'{h')} and for final change, = {(6', c) |{6', c} G G'{b')} — {{b' ^ d)\{d,b) G 

M(6)}. In the second case, M'(b') = 0. This will be called coarse accuracy. 

Another constraint is strictness. Strict strategies visit only marked compo- 
nents, and lenient strategies can visit any component at any time. 

A more complete exposition of these concepts can be found in [17,20]. The 
concepts are illustrated by the following example. 



Example 

As an example of a change, consider an evolution of a calendar program of Fig- 
ure 1. The components of the program are classes and the interactions between 
the components are relations of inheritance and agregation among them. For- 
mally, the program is described by eig 

Pi = {{user, eventList}, {eventList, eventAbstr}, 

{event Abstr, event}, {eventAbstr, time}, {time, date}} 

The functionality of the components and the interactions are explained in the 
annotations in Tables 1 and 2. In this program, we are going to change the 
insertion of a new appointment. If the user will want to enter a new appointment 
that involves a weekend, he/she will be warned and prompted to confirm or 
cancel such an appointment. This new functionality will be added to the program 
in a change propagation scenario that uses a coarse, strict, and final strategy. 




Fig. 2. Calendar program after change in class “user” 



The change starts in the class “user” by a modification of the method that 
inserts new appointments. The new method prompts the user to confirm or cancel 
the appointment when the dates of the appointment conflict with a weekend. 
The information about weekend comes from the class “eventList” and the old 
version of the class does not provide it. Hence after the change in class “user”, 
the program is inconsistent and it is represented by eig in Figure 2, where the 
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arrow represents the mark (or inconsistency). The eig of Figure 2 is formally 
described as P 2 = Pi U {(user, event List)}. 

The next class to be changed is the class “eventList”. The method to be 
changed is the method that includes a new appointment into the list of appoint- 
ments. The old version of that method returns a value that indicates whether the 
new appointment conflicts with any existing appointments. In the new version, 
the range of returned values has to be extended to indicate whether the new 
appointment conflicts with a weekend. The new version of the method invokes 
another method that will identify whether an event conflicts with a weekend. 
That method, like all time information, is inherited from “eventAbstr” and hence 
the change propagates further. The new eig after the visit is in Figure 3. Formally 
it is described as P3 = Pi U {(eventList, event Abstr)}. 




Fig. 3. Calendar program after change in “eventList” 



The class “event Abstr” is an aggregate of two instances of “time”, one for 
the beginning of the appointment and the other one for the end. The method 
that determines whether the new appointment conflicts with a weekend has to 
be added to “event Abstr” . In accordance with the final and coarse strategy of 
propagation, both “event” and “time” have to be marked, see eig in Figure 4. 
Formally it is described as P4 = Pi U {(event Abstr, event), (event Abstr, time)}. 




Fig. 4. Calendar program after visit to “eventAbstr' 
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The class “event” inherits all time information from “event Abstr” and there- 
fore the visit confirms that “event” does not need any change. The visit to 
the class “time” reveals that all information related to dates is inherited from 
the class “date”, so no change is required in that class. In spite of that, the 
change propagates further to the class “date”, see new eig in Figure 5. Formally, 
P 5 = Pi U {(time, date)}. 

The class “date” is the last visit of the change propagation scenario. In this 
visit, we provide a new method that identifies for a given date the nearest follow- 
ing weekend date. After the method has been added, the program is consistent 
again, and now supports the new feature that identifies appointments conflicting 
with weekends. 

This completes the example of change propagation. Please note that all of 
the classes of the program had to be visited in the scenario. All of them except 
two had to be changed. One class (“time”) did not need any change, yet the 
change propagated through it to the next class ( “date” ) . 




Fig. 5. Calendar program after visit to “time” 



5 Other Work 

In the previous sections, we described techniques dealing with program compre- 
hension and change propagation. This section briefly reviews other work related 
to these topics. 



5.1 Browsers 

Browsers are software tools that extract information about components and their 
interactions and store this information in a database. Examples of interactions 
are relationships between the definitions and the use of variables, procedures, 
types, etc. The user queries the database to navigate through the software, fol- 
lowing the dependencies among components. In order to enhance understanding, 
the results of the queries are often displayed graphically. Since change propagates 
through the dependencies from one component into another, browsers support 
change propagation scenarios. Examples of browsers are [5,15]. 
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It should be noted that the dependencies among software components are 
used in configuration management [21] for a simple form of change propagation: 
If a specific component changes, all components recursively depending on it must 
change. Compared to the scenarios of Section 4, this is a highly special case of 
change propagation, where the change propagates through the dependencies only 
in one direction, e.g. from the used to the using components. Note that in the 
example in Section 4, the change propagates in exactly the opposite direction. 
Another difference is the fact that during the change propagation scenarios, 
the interaction can be inconsistent while configuration management assumes 
a consistency. Also the configuration management marks all components that 
are recursively dependent on a changed component, while change propagation 
scenarios allow the change propagation to stop, without requiring a change in 
all recursively dependent components. 



5.2 Change Impact Analysis 

Change impact analysis is a process by which the programmers identify the com- 
ponents that will be impacted by the change. The purpose of a change impact 
analysis is to assess the cost and difficulty of a change before it is actually under- 
taken. A representative selection of articles on change impact analysis appears 
in [2], 

Change impact analysis includes concept location, by which a programmer 
locates a specific concept in the code (for example the conversion of a date to 
a day). Change requests are often formulated in terms of domain concepts, there- 
fore the recognition and location of concepts in the code is a key part of change 
impact analysis. In [1], the authors discuss a scenario of concept recognition 
based on reading a static program. Their strategy utilizes various clues, includ- 
ing identifier names and clusters of the function calls that reveal the location of 
the concept in the code. 

In [23], the features in the code are located by dynamic analysis. For that, 
the program is instrumented in such a way that the statements executed in a test 
are marked, and statements that were not executed are left unmarked. Then the 
program is executed two times, once with the feature being executed and once 
without the feature. The feature must be located within the statements that 
were executed the first time but not second time. 



5.3 Evolvable Architectures 

Since software evolution is inevitable, it is very important to make program 
architectures evolvable. Since program comprehension and change propagation 
are the most expensive parts of changes, the programs will be more evolvable if 
they are more comprehensible and if the change propagation is short. 

The programs can be made more comprehensible by having important 
domain concepts localized inside program components. A large part of program 
comprehension effort is geared towards establishing the traceability between 
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domain concepts and software components, and an architecture that has a sim- 
ple traceability is easier to comprehend. The change requests are often framed 
in terms of domain concepts, as we saw in the example of the Calendar where 
the domain notions “appointment” and “weekend” were used. If the changing 
concept is fully localized within one component, the change propagation will be 
short. This is the thrust of the recommendation of [12]. However it should be 
noted that the concepts are often intertwined in various ways, and while some 
concepts are localized, others are inevitably delocalized. 

The studies of evolvable software architectures sometimes provide surprising 
answers. In a case study of [18], we studied the evolvability of a software repos- 
itory, i.e. a program that is data oriented and the user manipulates the data 
through use cases, selectable from a menu. A typical evolution of such system 
is the addition of new use cases, which will allow additional data manipula- 
tions. We found that in this case, a program with use cases implemented as C 
functions and data implemented as SQL relational tables were more evolvable, 
than a program with the same functionality and implemented as a C++ object 
oriented program. The reason for this is in the fact that the use cases are the 
most important concepts in this case, and in object oriented programs, they are 
divided into small functions and delocalized into separate classes. Hence in this 
particular case, the functional/relational implementation has a better concept 
locality than the corresponding object oriented program. 

6 Conclusions and Future Work 

In this paper, we gave an overview of selected topics in software change and 
evolution. The study of this topic gives insight into how software changes are 
made and how they can be improved. 

A topic for future research is the implementation of a tool that supports 
change propagation. Such a tool must be able to analyze the program and store 
the results of the analysis in a database, similarly as browsers. However it has 
to have several additional capabilities. It should be able to analyze inconsistent 
programs, i.e. programs where interactions between the components are incon- 
sistent, as is often the case during the change propagation. Also it should be 
able to support different strategies of change propagation, by keeping the marks 
and automatically updating them based on the selected strategy. 

The strategies and scenarios of change propagation are also topics for future 
research. In this paper, we discussed the finality, strictness, and accuracy of sce- 
narios, but there are additional constraining factors, some of them dependent on 
a deeper analysis of the program. This deeper analysis should take into account 
component and interaction semantics and improve the accuracy of the marking 
after a change. That would relieve the programmer from the necessity to visit 
components that do not need change and do not propagate change. 

The situation where a component does not need a change but propagates it to 
its neighbors is intuitively a very likely place where the programmer may make 
an error. This observation is supported by observations from practice, where one 
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of the most common sources of the errors is forgotten update. Again a deeper 
semantic analysis of the programs may warn about some of these situations. 

Research in software change and evolution deals with an essential property of 
software. In this paper, we presented two specific solutions to partial problems: 
partitioned annotations of software for program comprehension, and evolving 
interoperation graphs for change propagation. The research topics in this field 
are interesting and applicable to programming practice. That should be attract 
researchers to this field. 
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1 Introduction 

Many fundamental problems from natural sciences deal with complex systems. 
We define a complex system as a population of unique elements with well defined 
microscopic attributes and interactions, showing emerging macroscopic behav- 
ior. This emergent behavior can, in general, not be predicted from the individual 
elements and their interactions. A typical example of emergent behavior is self- 
organization, e.g. Turing patterns in reaction-diffusion systems. Complex sys- 
tems are often irreducible^ and can not be solved in an analytical way. The only 
available option to obtain more insight into these systems is through explicit sim- 
ulation. Many of these problems are intractable: in order to obtain the required 
macroscopic information, extensive and computationally expensive simulation 
is necessary. Since simulation models of complex systems require an enormous 
computational effort, the only feasible way is to apply massively parallel compu- 
tation. A major challenge is to apply High Performance Computing in research 
on complex systems and, in addition, to offer a parallel computing environment 
that is easily accessible for applications [62,63]. 

Traditionally, science has studied the properties of large systems composed of 
basic entities that obey simple microscopic equations reflecting the fundamental 
laws of nature. These natural systems may be studied by computer simulations 
in a variety of ways. Generally, the first step in any computer simulation is to 
develop some continuous mathematical model that is subsequently discretized 
for implementation on a computer. An alternative, less widely used approach is 
to develop solvers that conserve the characteristic intrinsic parallel properties 
of the applications and that allow for optimal mapping to a massively parallel 
computing system. These solvers have the properties that they map the paral- 
lelism in the application via a simple transformation to the parallelism in the 
machine. With these transformations the necessity to express the application 
into complex mathematical formulations becomes obsolete. 

One example is the modeling of a fluid flow. Traditionally this problem is sim- 
ulated through mathematical description of the phenomenon via Navier- Stokes 

^ Irreducible problems can only be solved by direct simulation 

J. Pavelka, G. Tel, M. Bartosek (Eds.): SOFSEM’99, LNCS 1725, pp. 203-248, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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equations, and discretization of these equations into numerical constructs for 
algorithmic presentation on a computer. This process of simulation involves a 
number of approximations and abstractions to the real fluid flow problem: intrin- 
sic properties and explicit information of the physical phenomenon is obscured. 
Even worse, the possible implicit parallelism of the problem becomes completely 
indistinct in the abstraction process. An alternative approach would be to model 
the microscopic properties of the fluid flow with cellular automata, where the 
macroscopic processes of interest can be explored through computer simulation. 
This approach has the advantage that the physical characteristics of the fluid 
flow problem remain visible in the solving method and that the method con- 
serves the parallelism in the problem. Although this type of simulation methods 
is not yet completely understood and certainly not fully exploited, it is of cru- 
cial importance when massively parallel computers are concerned. We define this 
type of solvers as natural solvers. These techniques have in common that they 
are inspired by processes from nature [64] . Important examples of natural solvers 
are Genetic Algorithms (inspired by the process of natural selection). Simulated 
Annealing (inspired by the process of cooling heated material which converges to 
a state of minimal energy). Lattice Gases and the Lattice Boltzmann method (a 
many particle system, or cellular automaton method with a macroscopic behav- 
ior that corresponds to the hydrodynamic equations), and artificial Neural Net- 
works. We argue that in parallel computing the class of natural solvers results 
in a very promising approach, since the physical characteristics of the original 
phenomenon remain visible in the solving method and the implicit and explicit 
parallelism of the problem remain conserved. 

In Fig. 1 a “bird’s eye view” of the different steps of the mapping process 
from application to parallel machines is presented. As can be seen, an application 
is first transformed into a solver method. Here, detailed knowledge of the prob- 
lem domain is obligatory. Next, the intrinsic parallelism in the solver is passed 
through the Decomposition layer that captures the parallelism and dependen- 
cies into objects and communication relationships. Finally these two classes are 
mapped onto a Virtual Parallel Machine model that allows for implementation 
on a large suite of parallel systems [52]. 

To be able to capture the generic aspects of parallel solvers and to express 
the basic properties of the natural system, we will define our own abstract solver 
model indicated as the Virtual Particle model. The Virtual Particle (VIP) can 
be defined as the basic element in the simulation model. The VIP can be defined 
on several levels of abstraction. For example in a simulation model of a bio- 
logical system, the VIP can correspond to a certain level of organization and 
aggregation in the system (e.g. molecule-organelle-cell-tissue-organ-organism- 
population). The choice of the abstraction level is determined by a combina- 
tion of the desired refining of the model and the computational requirements. In 
the VIP model the microscopic, temporal or spatial, rules have to be specified 
in such a way that they approximate the microscopic rules as observed in the 
actual system. In the VIP model, the VIPs may correspond to the individual 
particles in the natural solver, as for example in lattice gases. Alternatively, 
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Fig. 1. Outline of a Parallel Programming Model for Dynamic Complex Systems 
on Massively Parallel Computers 



the particles can be organized hierarchically, where VIPs can be an individual 
particle or clusters of VIPs. The application model is mapped onto the Virtual 
Parallel Machine Model (see Fig. 2), which can be another instance of a dynamic 
complex system consisting of a population of processors. In this case both load 
balancing and minimization of communication can be taken into account in a 
graph representation [56,59]. 

In this paper we will focus on cellular automata methods for modeling phe- 
nomena from natural sciences. In Section 2 the theoretical background of Cellular 
Automata (CA) will be briefly discussed. In Section 3 different execution mod- 
els for CA will be discussed. Section 4 presents an example of a very specific 
CA that can be used as a model of fluid flow. In Section 5 we will demon- 
strate the use of two types of execution models. The first application shows how 
synchronous cellular automata can model growth processes in a moving fluid. 
The second application demonstrates an asynchronous execution scheme for a 
continuous-time Ising spin model. 
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Fig. 2. Basic structure of Dynamic Complex System Paradigm: the mapping 
of the application model onto the machine model. The internal transformation 
denotes the mapping of the application graph onto the machine graph 

2 Background of Cellular Automata Concepts 

2.1 Introduction 

Cellular Automata are discrete, decentralized, and spatially extended systems 
consisting of large numbers of simple identical components with local connec- 
tivity. The meaning of discrete here is, that space, time, and features of an 
automaton can have only a finite number of states. The rational of cellular 
automata is not to try to describe a complex system from a global point of view 
as it is described using for instance differential equations, but modeling this 
system starting from the elementary dynamics of its interacting parts. In other 
words, not to describe a complex system with complex equations, but let the 
complexity emerge by interaction of simple individuals following simple rules. 
In this way, a physical process may be naturally represented as a computational 
process and directly simulated on a computer. The original concept of cellular 
automata was introduced by von Neumann and Ulam to model biological repro- 
duction and crystal growth respectively [66,68]. Von Neumann was interested 
in the connections between biology and computation. Specifically the biological 
phenomenon of self-reproduction modeled by automata triggered his research in 
this field. According to Burks [7], Stanislaw Ulam suggested the notion of cellu- 
lar automata to von Neumann as a possible concept to study self-reproduction. 
Since then it has been applied to model a wide variety of (complex) systems, in 
particular physical systems containing many discrete elements with local inter- 
actions [44,73]. Cellular Automata have been used to model fluid flow, galaxy 
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formation, biological pattern formation, avalanches, traffic jams, parallel com- 
puters, earthquakes, and many more. In these examples, simple microscopic rules 
display macroscopic emergent behavior. For some Cellular Automata it can be 
proven that they are equivalent to Universal Computers, thus in principle able 
to compute any given algorithm, comparable to Tarings Universal Computing 
Machine (see for instance [5]). Furthermore Cellular Automata can provide an 
alternative to differential equations for the modeling of physical systems. It is 
this combination of 

— simple local rules, 

— association with universal computing automata, 

— alternative to differential equations, 

— models of complex systems with emergent behavior, 

— and bridging the gap between microscopic rules and macroscopic observables 

that has renewed interest in Cellular Automata (CA’s). The locality in the rules 
facilitate parallel implementations based on domain decomposition, the Univer- 
sal Computing behavior supports fundamental research into the intrinsics of 
computation in CA’s, and the modeling power of CA’s is of utmost importance 
to study a huge variety of complex systems. Although John von Neumann intro- 
duced the cellular automata theory several decades ago, only in recent years it 
became significant as a method for modeling and simulation of complex systems. 
This occurred due to the implementation of cellular automata on massively par- 
allel computers. Based on the inherent parallelism of cellular automata, these 
new architectures made possible the design and development of high-performance 
software environments. These environments exploit the inherent parallelism of 
the CA model for efficient simulation of complex systems modeled by a large 
number of simple elements with local interactions. By means of these environ- 
ments, cellular automata have been used recently to solve complex problems in 
many fields of science, engineering, computer science, and economy. In partic- 
ular, parallel cellular automata models are successfully used in fluid dynamics, 
molecular dynamics, biology, genetics, chemistry, road traffic flow, cryptography, 
image processing, environmental modeling, and finance [65]. 



2.2 Simple ID Cellular Automata 

A CA consists of two components: a lattice of N identical finite-state machines 
called cells, each with an identical pattern of local connections to other cells for 
input and output, and a transition rule. Let E denote the set of states in the 
cell’s finite state machine, and k = \E\ denote the number of states per cell. The 
state of cell i at time t is denote by 5^, with s\ ^ E. The state of cell i together 
with the states of the cells to which cell i is connected is called the neighborhood 
n\ of cell i. The transition rule (j) {n\) gives the updated state for each cell i 
as a function of n\. In a CA a global clock usually provides the update signal for 
all the cells; cells update their states synchronously. In Section 3 we will discuss 
the consequences of asynchronous updates. 
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Fig. 3. The “space-time” behavior of CA 110, where black = 0 and white = 1. 
Space is on the horizontal axis, time flows from top to bottom 



Consider for instance the following one-dimensional CA with k = 2 (i.e. 
U = {0, 1}) and a transition rule given by 

Neighborhood: 111 110 101 100 011 010 001 000 . . 

Output bit: 0 110 1110 

then the transition rule number (due to Wolfram [73]) is given by “110”, being 
the decimal representation of the output bit-string “01101110”. The “space- 
time” behavior of CA 110 is shown in Fig. 3. Starting from a random initialization 
of the CA with length 250, the transition rule is iteratively applied 50 time steps. 
The “time” runs from top to bottom. For a one-dimensional CA the size of the 
neighborhood n\ is given by 2r-hl with r the radius of the CA (in CA 110, r = 1). 
Wolfram studied in great detail the 256 possible one-dimensional /c = 2, r = 1 
CAs (so-called elementary CAs) and classified them accordingly to dynamical 
systems [72]. 

— Class 1: Fixed point behavior. Almost all initial configurations relax after a 
transient period to the same fixed configuration. 

— Class 2: Periodic behavior. Behavior like in class one but with temporally 
periodic cycles of configurations included. 

— Class 3: Chaotic behavior. Unpredictable space-time behavior. 

— Class 4: Complex behavior. Complicated localized patterns occur, sometimes 
“long-lived”. 

For infinite Class 4 CAs it is effectively undecidable whether a particular 
rule operating on a particular initial seed will ultimately lead to a frozen state 
or not, this is the CA analog of Tarings Halting problem. It is speculated that 
all CAs with Class 4 behavior are capable of universal computation [72]. In 1990 
it was shown by Lindgren and Nordahl for the first time that a one- dimensional 
CA (r = 1, /c = 7) exhibit universal computing behavior [38]. If Class 4 CAs are 
capable of universal computing, then CA 110, shown in Fig. 3, is a very good 
elementary CA candidate for universal computation [37]. 
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2.3 Requirements for Computability in Cellular Automata 

Some authors [13,35,72] suggest that when a system displays complex behavior, 
universal computations can be performed. Mechanically speaking a computa- 
tional system requires transmission^ storage and modification of information. 
So, whenever we identify those three components in some dynamical system, the 
system could be computationally universal. But then the question remains when 
does this happen? Loosely speaking we can say, using information theoretical 
results, that it must take place at an intermediate level of entropy: stored infor- 
mation lowers the entropy, but transmission of information increases the entropy 
level. Therefore we will briefly review entropy measures of Cellular Automata in 
Section 2.3.1. In a number of papers, Christopher Langton has tried to answer 
this question by considering Cellular Automata as a theoretical model for a 
physical system. The hypothesis “Computation at the Edge of Chaos” resulted 
from this research. Briefly it states that universal computations can take place 
at the border between order and chaos. This statement partly resulted from the 
observation that correlations can become infinite during, or at a second order 
phase transition between, for example, a solid and a liquid phase. Recall that 
a discontinuous change in an order parameter of the system corresponds to a 
first order transition. A sudden, but continuous, change corresponds to a second 
order transition. At such a transition, the system is in a critical state. Langton 
and Kauffman [33,35] believe that the resulting infinite correlations can be inter- 
preted as long-term memory needed to store information. We will review these 
notions and recent objections to this hypothesis briefly in Section 2.3.2. 



2.3.1 Information in CA In order to observe phase transitions in CA evolu- 
tion, quantitative order parameters are needed. These order parameters need to 
distinguish between ordered and disordered states. A commonly used quantity 
for this purpose is the Shannon entropy [60], defined on a discrete probability 
distribution pp. 



H = -'^Pilogpi (2) 

i 

This measure H can be associated with the degree of uncertainty about a system. 
In a Cellular Automata, entropy can be defined on the possible subsequences 
of A-length blocks in a /c-state CA. In a random sequence all subsequences 
must occur with equal probability. With probabilities pi for the possible 
subsequences: 



Hn = -'^Pilogpi (3) 

The spatial block entropy [21] is now defined as: 

= Hn+1 - Hn 



(4) 
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The superscripts {x) indicate that spatial sequences are considered. From Eq. 4 
follows the spatial measure entropy^ or entropy rate [21]: 

= lim (5) 

N^OO 

The measure entropy gives the average information content per site. Analogous 
to the spatial entropy, one can define temporal entropy^ where blocks oi N x T 
sites are considered: 

= lim (6) 

N,T^oo 

Eq. 4 decreases monotonically with while decreases with T. The differ- 
ence, 

= (7) 

is the amount of information by which a state of a cell i N becomes 

(x) 

less uncertain if the cell state Si is known. Sh)y^ is called the A^-th order mutual 
information in space. Intuitively one could regard mutual information as the 
stored information in one variable about another variable and the degree of 
predictability of a second variable by knowing the first. 

The space-time (random) processes that occur in deterministic CA can also 
be studied through Kolmogorov- Sinai entropy per unit time [19]: the so-called 
( 5 , r) -entropy 

/jspace, ^ T, T, V) , 

TV ^00 1 V 

where i/(T, V) denotes the entropy of the process over a simulation time T and 
volume V. Eor instance H{T^V) for rule 132 ^ ElogT and for rule 250 where 
the initial state is attracted towards spatially uniform (or periodic) configuration 
H{T^V) ^ logTE. In all cases the entropy per unit time and unit volume 
(hspace, is Vanishing. 

2.3.2 Computation in Cellular Automata If we have a k-staie CA with 
a neighborhood size r, the total number of possible transition rules is , 

which can become very large, even for a moderate number of states and/or a 
small neighborhood. If a structure is present in this enormous space, it should 
be possible to identify areas of equal complexity (i.e. the Wolfram classes) and 
show how these areas are connected to each other. Using this ordering one can 
locate those areas which support the transmission, storage and modification of 
information. Langton [35,36], suggested the parameter A to structure the CA 
rule-space. An arbitrary state s G U is assigned the quieseent state Sq. Let 
there be n transitions to this quiescent state in an arbitrary transition rule. The 
remaining k'^ — n transitions are filled randomly by picking uniformly over the 
other k — 1 states: 

, k^ — n 



k 



(8) 
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If A = 0.0 then all transitions in the rule will be to the quiescent state If A = 
1.0 there will be no transitions to Sq. All states are represented equally in the rule 
if X = 1 — 1/k. With the aid of the A-parameter it should be possible to examine 
the assumption that complex behavior is located at the intermediate regime 
between ordered and disordered behavior. The spectrum of dynamical behavior 
can be explored with the so-called table-walk-through-method which increases 
the A-parameter at successive time steps. At each new time step a transition 
table is incrementally updated using the transition table at the previous time 
step. Because the described method is actually a “random walk” through a coarse 
grained version of the CA state-space, each table-walk displays quantitatively 
different behavior. Several measures can be used to characterize the dynamical 
behavior of the CA at each new value of the A-parameter. These measures include 
the numerical determination of block entropies, and both temporal and spatial 
mutual information statistics. 

At intermediate values of A, i.e. at the edge between ordered and disordered 
dynamics, several events seem to occur: 

— transient lengths grow rapidly, analogously to the physical event of critical 
slowing down^ 

— transient lengths depend exponentially on the size of the CA, 

— mutual information measures (see Eq. 7) reach their maximum values, see 
Fig. 4 (left), at the entropy transition, see Fig. 4 (right). 





Fig. 4. Temporal mutual information between two sites separated by one time 
step, and site entropy, both for 4- state 2-neighbor Cellular Automata 



The exponential dependence of transient lengths on the size of the CA is anal- 
ogous to the exponential dependence on problem size in the NP and P SPACE 
complexity classes. As for the halting-computations, it will be formally undecid- 
able for an arbitrary CA in the vicinity of a phase transition whether transients 
will ever die out. The increase in mutual information indicates that the correla- 
tion length is growing, which implies further evidence for a phase transition in 
that region. Of course we cannot observe a real phase transition other than in the 
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thermodynamic limit. Other discussions for the “Edge of chaos” hypothesis can 
be found in the work of Crutchfield on continuous dynamical systems [12] and 
the resulting e-machine reconstruction. In [14] the so-called intrinsic computation 
abilities of a continuous dynamical system are investigated. This term refers to 
structures that emerge in a system’s behavior that can be understood in compu- 
tational terms. The output of the system (e.g. an iterative map, = f{xn)) in 
time is coarse grained into a sequence of zeros and ones. In other words the out- 
put domain Xn is divided into two regions, Pq = {xn < Xc} and Pi = {xn > Xc}^ 
where Xc is an arbitrary chosen division point. The complexity of the dynami- 
cal system is quantified by construction of the minimal regular language which 
accepts the generated sequence. The complexity and entropy (see Eq. 2) for the 
logistic map was examined in [14] using the method of regular language complex- 
ity (size of the corresponding finite automaton). It was found that the lowest 
values of complexity corresponds to the periodic and fully chaotic regimes of 
the map. The highest value of the complexity occurs where the period doubling 
cascade of the map meets the band-merging cascade, i.e. at the border between 
order and chaos. In [17] the work of Langton and Crutchfield is complemented 
by examining the dynamical behavior of a well-known computational device: 
the Turing Machines (TM). A class of 7-state 4-symbol Turing machines, which 
also includes Minsky’s universal Turing machine [46], was used to address the 
question whether universal computation is found between order and chaos. A 
large number of randomly created TM’s was used to generate three different 
sequences: a sequence of symbols read, a sequence of states and a sequence of 
moves made by the TM head. Eor all these sequences, the corresponding regular 
language complexity was calculated using the technique of e-machine reconstruc- 
tion and plotted against its block-entropy (see Eq. 4). They found that the most 
complex TM’s are indeed located at intermediate values of the entropy, including 
Minsky’s universal TM. Mitchell et a/., reviewed this idea of computation at the 
“edge of chaos” and reported on experiments producing very different results 
from the original experiment by Packard [54], they suggest that the interpre- 
tation of the original results is not correct [48]. Those negative results did not 
disprove the hypothesis that computational capability can be correlated with 
phase transitions in CA rule space; they showed only that Packards results did 
not prove the hypothesis [47]. All in all this is still an open research question 
that might have a large impact on the understanding of computation in CA’s. 



2.4 Modeling with Cellular Automata 

Cellular Automata can be an alternative to Differential Equations (DE) for the 
modeling of physical systems. To integrate a DE numerically, it must be dis- 
cretized in some way. This discretization is an approximation essentially equiv- 
alent to setting up a local discrete (dynamical) system that in the macroscopic 
limit reduces to the DE under consideration. The idea now is to start with a 
discrete system rather than a continuous description. There is no telling which 
model (a-priori discrete, or continuous) models the physics best. It is instructive 
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to consider the example of the wave equation in one dimension [1] 

d^f 2d^f 



dt‘^ dx‘^ 



(9) 



This equation has two types of solution that are waves traveling to the right 
and to the left with wave vectors k and frequencies uok = ck: 






A particular solution is obtained by choosing coefficients Aj^ and Bk- 

f = A{x — ct) B {x ct) 

with 



A{x) = Y,Ake^^ 



,ikx 



( 10 ) 

( 11 ) 

( 12 ) 



and B{x) analogous, with A{x) and B{x) two arbitrary functions that specify 
the initial conditions of the wave in an infinite space. We can construct a simple 
one-dimensional CA analog to this wave equation. For each update, adjacent 
cells are paired into partitions of two cells each, where the pairing switches from 
update to update: the dynamics is completely given by switching the contents 
of two adjacent cells in an update. 
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Given an arbitrarily chosen initial condition, it can be seen that the contents 
of the odd cells move systematically to the right, where the contents of the even 
cells move to the left; both with a constant velocity c. The dynamics of this CA 
is the same as the dynamics of the wave equation in infinite space, we only need 
to code the initial conditions in the cells appropriately. If a binary representation 
of the cells is used {1,-1}, then the local average over the odd cells represents 
the right traveling wave A{x — ct), and the local average over the even cells 
represents B{x ct). 

Another useful CA simulation is found in the study of excitable media. Win- 
free et al [70] discussed already a simple 2D example that requires three symbols 
denoting relevant states in a biological cell: Q for quiescent, E for excited, and 
T for tired. If one of the neighbors is excited, Q is followed by E. After one 
time step, an excited cell becomes tired and the is set back to Q. This rule gives 
rise to propagating spirals that are qualitatively similar to those observed in 
Belousov- Zhabot insky reactions. 
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A beautiful extension to this 2D biological cellular model was first reviewed 
by Celada et al. [8]. In this system model, B and T lymphocytes are modeled 
together with antigen and antibody molecules. They showed simulations of cell 
response to antigens, response regulation, the minimum number of MHC-type 
of molecules, and natural selection in the MHC-species repertoire. 

One of the most successful efforts to use CAs for simulation of DEs is the 
lattice gas automaton to simulate hydrodynamics. This will be described in the 
Section 4. First different parallel execution models for cellular automata will be 
presented in the next section. 

3 Execution Models for Cellular Automata 

3.1 Synchronous Cellular Automata versus Asynchronous Cellular 
Automata 

The cellular automata (CA) model is a conceptual simple and effective solver 
for dynamic complex systems (DCS). From a modelers perspective, a CA model 
allows the formulation of a DCS application in simple rules. From a computer 
simulation perspective, a CA model provides an execution mechanism that eval- 
uates the temporal dynamic behavior of a DCS given these simple rules. An 
important characteristic of the CA execution mechanism is the particular update 
scheme that applies the rules iteratively to the individual cells of the CA. The 
different update schemes impose a distinct temporal behavior on the model. Thus 
we must select the proper update mechanism that aligns with the dynamics of 
the model. 

In the previous discussion, the update mechanism of CAs is described as 
being synchronously in parallel. However, for certain classes of DCS, the tem- 
poral dynamic behavior is asynchronously. In particular, systems with hetero- 
geneous spatial and temporal behavior are, in general, most exactly mapped to 
asynchronous models [3,41]. In case asynchronous models are solved by CA, the 
asynchronous temporal behavior must be captured by the update mechanism. 
This class of CA is called Asynchronous Cellular Automata (ACA) [23,39,51,53]. 
The ACA model incorporates asynchronous cell updates, which are independent 
of the other cells, and allows for a more general approach to CA. With these 
qualifications, the ACA is able to solve more complicated problems, closer to 
reality [75]. 

Dynamic systems with asynchronous updates can be forced to behave in a 
highly inhomogeneous fashion. For instance in a random iteration model it is 
assumed that each cell has a certain probability of obtaining a new state and 
that cells iterate independently. As an example one can think of the continuous- 
time probabilistic dynamic model for an Ising spin system [40] . 

3.2 Types of Simulation 

Essential to every model is the time base on which changes to the system 
state occur. Accordingly, models can be classified depending on their tempo- 
ral dynamic behavior [74]. A model is a continuous-time model when time flows 
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smoothly and continuously. A model is a discrete-time model if time flows in 
jumps of some specified time unit. 

Continuous time models can be further divided into differential equation 
and discrete- event classes. A differential equation model is a continuous-time 
model where changes in the state occur smoothly and continuously in time. In 
a discrete-event model, even though time flows continuously, state changes can 
occur only at discrete points in time: time jumps from one event to the next, 
while these events can occur arbitrarily separated from each other. 

By the very nature of the CA rules that define the state transformations, the 
temporal dynamic behavior classes that are applicable to cellular automata are 
discrete-time and discrete-event. The figures Fig. 5(a) and Fig. 5(b) show the 
differences between the temporal behavior of state changes in both classes. 





(a) State changes in a discrete- 
time system model 



(b) State changes in a discrete- 
event system model 



Fig. 5. Temporal behavior of discrete-time and discrete-event system model 



3.2.1 Discrete-Time Models and Time-Driven Simulation In discrete- 
time models the progress of time is modeled by time advances of a fixed incre- 
ment, for example time is advanced in increments of exactly At time units. The 
execution mechanism that implements this temporal dynamic behavior is called 
time- driven simulation, since the clock initiates the state transitions of each 
individual cell in the CA. The execution mechanism in time-driven simulation is 
characterized by an iterative loop that, after each update of the simulation time, 
updates the state variables for the next time interval (t,t -b At]. The new state 
of a cell at time t -\- At is calculated from the state of the cell and its neighbors 
at time t. 

Time-driven simulation is the most widely known and applied approach for 
the simulation of CA models and natural systems in general. However, with 
the usage of time-driven simulation one has to ascertain that the time step 
At is small enough to capture every state change in the system model. This 
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might imply that we need to make At arbitrarily small, which is certainly not 
acceptable with respect to the computational requirements involved. Therefore, 
time-driven simulation is less appropriate for the simulation of discrete-event 
models, as there may be many clock ticks in which no events occur. 

3.2.2 Discrete-Event Models and Event-Driven Simulation The prog- 
ress of time in discrete-event models is modeled by the occurrence of instanta- 
neous state changes, called events, at random times and independent from each 
other. The execution mechanism that implements this temporal dynamic behav- 
ior is called event- driven simulation. In event-driven simulation, the progress of 
simulation time depends on the occurrence of the next event. The event-driven 
simulation execution mechanism maintains an ordered event list to hold expected 
future events. The simulation time progresses from the current time to the next 
scheduled event time. The simulation of one event may generate new events that 
are scheduled for future execution. 

An elegant and efficient characteristic of the event-driven simulation approach 
is that periods of inactivity are skipped over by advancing the simulation 
clock from event time to the next event time. This is perfectly save since — 
by definition — all state changes only occur at event times. Therefore causality 
and the validity of the simulation is guaranteed. The event-driven approach to 
discrete systems is usually exploited in queuing and optimization problems. How- 
ever, as we will see next, it is also a paradigm for the simulation of continuous- 
time systems. 

3.3 Parallel Simulation of Cellular Automata Models 

The parallelization of the CA, both for synchronous and asynchronous models, 
is realized by geometric decomposition. That is, the individual cells of the CA 
are aggregated into sublattices, which are mapped to the parallel processors. 
However, the parallel synchronization mechanism between the sublattices are 
very different for synchronous and asynchronous CA models. 

3.3.1 Parallel Synchronous Cellular Automata Simulation Similar to 
the sequential execution of synchronous CA, the cells in a parallel synchronous 
CA simulation undergo simultaneous state transitions under direction of a global 
clock. All cells must finish their state transition computations before any cell can 
start simulating the next clock tick. 

The parallelization of the discrete-time simulation is achieved by imitating 
the synchronous behavior of the simulation. The simulation is arranged into a 
sequence of rounds, with one round corresponding to one clock tick. Between 
each round a global synchronization of all cells indicates that the cells have 
finished their state change at time step t and the new time step t -\- At can be 
started. 

Generally, the simulation proceeds in two phases, a computation and state 
update phase, and a communication phase. The progression of time in time- 
driven simulation is illustrated in Fig. 6. 
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synchronization/ 

communication 




Fig. 6. Time-driven simulation of a synchronous CA model, where computation 
and communication phases succeed each other 



3.3.2 Parallel Asynchronous Cellular Automata Simulation In parallel 
ACA simulation, state transitions (further called events) are not synchronized 
by a global clock, but rather occur at irregular time intervals. In these sim- 
ulations few events occur at any single point in simulated time and therefore 
parallelization techniques based on synchronous execution using a global simu- 
lation clock perform poorly. Concurrent execution of events at different points in 
simulated time is required, but this introduces severe synchronization problems. 
The progress of time in event-driven simulation is illustrated in Fig. 7. 




Fig. 7. Progress of simulation time in event-driven simulation. As the cells evolve 
asynchronously in time, the simulation time of the individual cells are different 



The absence of a global clock in asynchronous execution mechanisms neces- 
sitates sophisticated synchronization algorithms to ensure that cause-and-effect 
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relationships are correctly reproduced by the simulator. Parallel discrete event 
simulation is essentially concerned with the correct ordering, or scheduling, of 
the asynchronous execution of events over parallel processors. There are basi- 
cally two methods to impose the correct temporal order of the asynchronous 
event execution: conservative and optimistic methods. 

First, the conservative approach proposed by Chandy and Misra [9] strictly 
imposes the correct temporal order of events. Second, the optimistic approach, 
introduced by Jefferson [24], uses a detection and recovery mechanism: when- 
ever the incorrect temporal order of events is detected, a rollback mechanism 
is invoked to recover. Although both approaches have their specific application 
area, optimistic methods offer the greatest potential as a general-purpose simu- 
lation mechanism. 

In optimistic simulation, the parallel simulation processes execute events and 
proceed in local simulated time as long as they have any input at all. A con- 
sequence of the optimistic execution of events is that the local clock or Local 
Virtual Time (LVT) of a process may get ahead of its neighbors’ LVTs, and it 
may receive an event message from a neighbor with a simulation time smaller 
than its LVT, that is, in the past of the simulation process. The event causing 
the causality error is called a straggler. If we allow causality errors to happen, 
we must provide a mechanism to recover from these errors in order to guarantee 
a causally correct parallel simulation. Recovery is accomplished by undoing the 
effects of all events that have been processed prematurely by the process receiv- 
ing the straggler. The net effect of the recovery procedure is that the simulation 
process rolls back in simulated time. 

The premature execution of an event results in two things that have to be 
rolled back: (i) the state of the simulation process and (ii) the event messages 
sent to other processes. The rollback of the state is accomplished by periodically 
saving the process state and restoring an old state vector on rollback: the simu- 
lation process sets its current state to the last state vector saved with simulated 
time earlier than the timestamp of the straggler. Recovering from premature 
sent messages is accomplished by sending an anti-message that annihilates the 
original when it reaches its destination. The messages that are sent while the 
process is propagating forward in simulated time, and hence correspond with 
simulation events, are called positive messages. 

A direct consequence of the rollback mechanism is that more anti-messages 
may be sent to other processes recursively, and that it allows all effects of erro- 
neous computation to be eventually canceled. As the smallest unprocessed event 
in the simulation is always safe to process, it can be shown that this mechanism 
always makes progress under some mild constraints [24]. 

In optimistic simulation the notion of global progress in simulated time is 
administered by the Global Virtual Time (GVT). The GVT is the minimum of 
the LVTs for all the processes and the timestamps of all messages (including 
anti-messages) sent but unprocessed. No event with timestamp smaller than the 
GVT will ever be rolled back, so storage used by such event (i.e., saved state 
vector and event message) can be discarded. Also, irrevocable operations such 
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as I/O cannot be committed before the GVT sweeps past the simulation time at 
which the operation occurred. The process of reclaiming memory and committing 
irrevocable operations is referred to as fossil collection. 

To summarize, the parallel synchronous execution mechanism for discrete-time 
models mimics the sequential synchronous execution mechanism by interleaving 
a computation and state update phase with a synchronization and communi- 
cation phase. The parallel execution mechanism is fairly simple and induces a 
minimum of overhead on the computation. The parallel asynchronous execu- 
tion mechanism for discrete-event models, the so-called optimistic simulation 
method, is more expensive than its sequential counterpart. The synchronization 
mechanism in optimistic simulation requires extra administration, such as state 
saving and rollback. Despite this overhead, optimistic simulation is an efficient 
parallel execution mechanism for discrete-event models. In Section 5, two appli- 
cations are presented that are typical examples of respectively synchronous and 
asynchronous CA models. 

4 Cellular Automata as Models for Fluid Flow 

4.1 Introduction 

Section 2 introduced the basic idea behind Cellular Automata (CA) and exem- 
plified how one can reason about information and complexity in general CAs. 
Here we introduce a very specific CA, which, as will become clear later on, can 
be used as a model of fluid flow. This class of CA is called Lattice Gas Automata 
(LG A), and they are described in detail in two recent books [11,58]. 

Suppose that the state of a cell is determined by bm surrounding cells. Usually, 
only the nearest and next-nearest neighbors are considered. For example, on a 
square lattice with only nearest neighbor interactions bm = 4, if next-nearest 
neighbors are also included bm = and on a hexagonal lattice with nearest 
neighbor interactions 6^ = 6. Furthermore, suppose that the state of the cell is 
a vector n of b = bm bits. Each element of the state vector is associated with a 
direction on the CA lattice. For example, in the case of a square grid with only 
nearest neighbor interactions we may associate the first element of the state 
vector with the north direction, the second with cash the third with south and 
the fourth with west. With these definitions we construct the following CA rule 
(called the LGA rule), which consists of two sequential steps: 

1. Each bit in the state vector is moved in its associated direction (so in the 
example, the bit in element 1 is moved to the neighboring cell in the north) 
and placed in the state vector of the associated neighboring cell, in the same 
position (so, the bit in element 1 is moved to element 1 of the state vector in 
the cell in the north direction). In this step each cell is in fact moving bits 
from its state vector in all directions, and at the same time is receiving bits 
from all directions, which are stored into the state vector. 
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2. Following some deterministic or stochastic procedure, the bits in the state 
vector are reshuffled. For instance, the state vector (1,0, 1,0) is changed to 
( 0 , 1 , 0 , 1 ). 

As a refinement, one may also introduce br extra bits in the state vector 
which, as it were, reside on the cell itself, i.e. are not moved to another cell in 
step 1 of the LG A rule. In that case the length of the state vector 6 = 6^ + 6^. 
These residing bits do however participate in the reshuffling step 2. 

It is clear that the class of LGA-CA that we have just defined is very large. 
We have freedom to choose the GA lattice, the interaction list, the number of 
residing bits, and the reshuffling rule. Once all this is done, we may expect 
that the specific LGA-GA that we defined has a very rich dynamical behavior, 
depending on the initial conditions and the size of the grid. Except maybe for 
1-dimensional lattices, a detailed study of the dynamics of such GA is probably 
not feasible. It was shown by Moore and Nordhal [49] that the problem of LGA 
prediction is P-complete, thus cannot be solved in parallel in polylogarithmic 
time. This implies that the only way out is a step by step explicit simulation. 
Our new GA therefore seems like a nice toy that may exhibit a very complex 
dynamical behavior, but no more than that. However, maybe very surprisingly, 
if we associate physical quantities to our GA, enforce physical conservation laws 
on the bit reshuffling rule of step 2, and use methods from theoretical physics 
to study the dynamics, we are in fact able to analyze the GA in terms of its 
average behavior, i.e. the average state vector of a cell and the average flow of 
bits between cells can be calculated. Even better, it turns out, again within the 
correct physical picture, that this GA behaves like a real fluid (such as water) and 
therefore can be used as a model for hydrodynamics. Eurthermore, as the LGA 
rule is intrinsically local (only nearest and next-nearest neighbor interactions) 
we constructed an inherently parallel model for fluid flow. 



4.2 Associating Physics with the LGA-CA 

Our current image of the LGA-GA is that of bits that first move from a cell 
to a neighboring cell and are then reshuffled into another direction. Now we 
associate the bits in the state vector with partieles; a one-bit codes for the 
presence of a particle, and a zero bit codes for the absence of a particle. Assume 
that all particles are equal and have a mass of 1. Step 1 in the LGA-GA is 
now interpreted as streaming of particles from one cell to another. If we also 
introduce a length scale, i.e. a distanee between the cells (usually the distance 
between nearest neighbors cells is taken as I) and a time scale, i.e. a time duration 
of the streaming (i.e. step I in the LGA-GA rule, usually a time step of I is 
assumed), then we are able to define a veloeity Ci for each particle in direction i 
(i.e. the direction associated with the Tth element of the state vector n). Step I 
of the LGA-GA is the streaming of particles with velocity Ci from one cell to a 
neighboring cell. The residing bits can be viewed as particles with a zero velocity, 
or rest particles. Now we may imagine, as the particles meet in a cell, that they 
collide. In this collision the velocity of the particles (i.e. both absolute speed and 
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direction) can be changed. The reshuffling of bits in step 2 of the LGA-CA rule 
can be interpreted as a collision of particles. In a real physical collision, mass, 
momentum, and energy are conserved. Therefore, if we formulate the reshuffling 
such that these three conservation laws are obeyed, we have constructed a true 
Lattice Gas Automaton, i.e. a gas of particles that can have a small set of discrete 
velocities c^, moving in lock-step over the links of a lattice (space is discretized) 
and that all collide with other particles arriving at a lattice point at the same 
time. In the collisions, particles may be send in other directions, in such a way 
that the total mass and momentum in a lattice point is conserved. 

We can now associate with each cell of the LGA-CA a density p and momen- 
tum pu, with u the velocity of the gas: 



h 
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(13) 
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pu = '^CiNi, 


(14) 



where Ni = (n^), i.e. a statistical average of the Boolean variables; Ni should be 
interpreted as a particle density. 

The good thing now is, if we let the LG A evaluate and calculate the density 
and momentum as defined in Eqs. (13, 14), that these quantities behave just like 
a real fluid. 

In Figs. 8 A and 8B we show an example of an LG A simulation of flow around 
a cylinder. In Fig. 8A we show the results of a single iteration of the LGA, so 
in fact we have assumed that Ni = n^. Clearly, the resulting flow field is very 
noisy. In order to arrive at smooth flow lines one should calculate Ni = (n^). 
Because the flow is static, we calculate Ni by averaging the boolean variables 
over a large number of LGA iterations. The resulting flow velocities are shown 
in Fig. 8B. 

The evolution of the boolean variables can be expressed as 

rii (x + Ci, t + 1) - Hi (x, t) = Ai (n (x, t)) (15) 

where x denotes the position of a lattice point and A is the collision operator. 
The collision operator must obey mass, momentum, and energy conservation, i.e. 
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y^CiZ\i (n) = 0, 

i=l 
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(17) 


y]cfZ\i(n) = 0, 

i=l 


(18) 



where q = |ci|. One can ask if the evolution equation 15 is also valid for the 
averaged particle densities Ni. It turns out that this is possible, but only under 
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A B 



Fig. 8. A. LGA simulation of flow around a cylinder, the result of a single 
iteration of the LGA is shown. The arrows are the flow velocities, the length 
is proportional to the absolute velocity. The simulations were done with FHP- 
III, on a 32 X 64 lattice, the cylinder has a diameter of 8 lattice spacings, only 
a 32 X 32 portion of the lattice is shown; periodic boundary conditions in all 
directions are assumed. 

B. As in Fig. 8 A, now the velocities are shown after averaging over 1000 LGA 
iterations 



the Boltzmann molecular chaos assumption which states that particles that col- 
lide are not correlated before the collision, or, in equations, that for any number 
of particles k, {nin 2 • • .rik) = (ui) ( 77 . 2 ) . . . (u/c). In that case one can show that 
{Ai (n)) = Ai (N). By averaging Eq. (15) and applying the molecular chaos 
assumption we find 

Ni (x + Ci, t + 1) - Ni (x, t) = Ai (N (x, t)) . (19) 

A first order Taylor expansion of (x + c^, t + 1), substituted into Eq. (19) 
results in 



dtNi (x, t) + daCiaNi (x, t) = Ai (N (x, t)) (20) 

Note that the shorthand dt means the subscript a denotes the a- 
component of a T)-dimensional vector, where D is the dimension of the LGA 
lattice, and that we assume the Einstein summation convention over repeated 
Greek indices (e.g. in two dimensions daCiaNi = dxCixNi + dyCiyNi). Next we 
sum Eq. (20) over the index i and apply Eqs. (13, 16, 17), thus arriving at 
dtp da {pUa) = 0, or 

^ + V-pu = 0, ( 21 ) 

which is just the equation of continuity that expresses conservation of mass in 
a fluid. One can also first multiply Eq. (20) with and then summate over the 
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index i. In that case we arrive at 



^tp'^a — 0 , 



( 22 ) 



with 




(23) 






The quantity must be interpreted as the flow of the a-component of the 
momentum into the /^-direction, is the momentum density flux tensor. In 
order to proceed one must be able to find expressions for the particle densities 
Ni. This is a highly technical matter that is described in detail in e.g. [11,58]. 
The bottom line is that one first calculates the particle densities for a LGA in 
equilibrium, and substitute them into Eq. (23). This results in an equation 
that is almost similar to the Euler equation, i.e. the expression of conservation 
of momentum for an inviscid fluid. Next, one proceeds by taking into account 
small deviations from equilibrium, resulting in viscous effects. Again, after a very 
technical and lengthy derivation one is able to derive the particle densities, sub- 
stitute everything into Eq. (23) and derive the full expression for the momentum 
conservation of the LGA, which again very closely resembles the Navier- Stokes 
equations for an incompressible fluid. The viscosity and sound speed of the LGA 
are determined by its exact nature (i.e. the lattice, the interaction list and the 
number of residing particles, and the exact definition of the collision operator). 

At first sight the average, macroscopic behavior of the LGA may come as 
a big surprise. The LGA-GA is a model that reduces a real fluid to one that 
consists of particles with a very limited set of possible velocities, that live on the 
links of a lattice and all stream and collide at the same time. Yet, theoretical 
analysis and a large body of simulation results show that, although the LGA 
is indeed a very simple model, it certainly is a realistic model of a real fluid. 
However, it is true that not all LGA behave as a real fluid. The underlying lattice 
must have enough symmetry such that the resulting macroscopic equations are 
isotropic, as in a real fluid. Eor instance, the first LGA, the so-called HPP model, 
which is defined on a two dimensional square lattice with only nearest-neighbor 
interactions and no rest particles, is not isotropic. The EHP models, which have 
a two dimensional hexagonal lattice, do possess enough symmetry and their 
momentum conservation law have the desired isotropy property. 

To end this section we stress once more that the LGA is an intrinsically local 
GA and therefore gives us an inherently parallel model for fluid flow simulations. 
Some case studies to support this will be provided in later sections. 

4.3 The FHP Model 

The EHP model, named after its discoverers Erisch, Hasslacher, and Pomeau, 
was the first LGA with the correct (isotropic) hydrodynamic behavior. The EHP 
model is based on a 2-dimensional hexagonal lattice, as in Eig. 9. This figure 
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Fig. 9. Lattice and update mechanism of the FHP-I LGA. A dot denotes a 
particle and the arrow its moving direction. In A to C the propagation and 
collision phases are shown for some initial configuration 




A B 

Fig. 10. Collision rules of FHP-L A dot denotes a particle and the arrow its 
moving direction. The left figure shows the two particle collisions, the right 
figure the three particle collisions 

also shows examples of streaming and collisions of particles in this model. In 
the FHP-I model, which has no rest particles (i.e. 6^ = 0 and 6 = 6^ = 6), 
only 2-body and 3-body collisions are possible, see Fig. 10. Note that all these 
collision configurations can off course be rotated over multiples of 60°. 

For this model we can easily write an explicit expression for the collision 
operator, as 



Z\i (n) = Z\f ^ (n) + (n) . (24) 

The three body collision operator is 

Z\f ^ (n) = ni+ini+3ni+5nini+2rii+A ~ niUi+ 2 ni+Aii+ini+ 3 ni+ 5 , (25) 

where Ui = 1 — rii and the subscript should be understood as “modulo 6” . A 
similar expression can be obtained for the two-body collisions (see e.g. [58]). It 
is clear that the implementation of this LGA, i.e. the evolution equation 15 with 
the FHP-I collision operator (Eq. 24), using bit wise operations, is straightfor- 
ward and can result in very fast simulations with low memory consumption. 
Furthermore, the inherent locality of the LGA rule makes parallelization triv- 
ial. Next, by averaging the boolean variables n^, either in space or in time, one 
obtains the particles densities Ni and from that, using Eqs. (13, 14), the density 
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and fluid velocity. Many people, especially those who are used to simulate flow 
patterns based on numerical schemes derived from the Navier Stokes equations, 
find it hard to believe that such a simple Boolean scheme is able to produce 
realistic flow simulations. Yet the LGA, which in a sense originated from the 
original ideas of von Neuman who invented CA as a possible model to simulate 
life, is a very powerful and viable model for hydrodynamics. 

4.4 The Lattice Boltzmann Method 

Immediately after the discovery of LGA as a model for hydrodynamics, it was 
criticized on three points; noisy dynamics, lack of Galilean invariance, and expo- 
nential complexity of the collision operator. The noisy dynamics is clearly illus- 
trated in Fig. 8 A. The lack of Galilean invariance is a somewhat technical mat- 
ter which results in small differences between the equation for conservation of 
momentum for LGA and real Navier Stokes equations, for details see [58]. Adding 
more velocities in an LGA leads to increasingly more complex collision operators, 
exponentially in the number of particles. Therefore, another model, the Lattice 
Boltzmann Method (LBM), was introduced. This method is reviewed in detail 
in [10]. 

The basic idea is that one should not model the individual particles n^, but 
immediately the particle densities i.e. one iterates the Lattice Boltzmann 
Equation 19. This means that particle densities are streamed from cell to cell, and 
particle densities collide. This immediately solves the problem of noisy dynamics. 
However, in a strict sense we no longer have a GA with a boolean state vector. 
However, we can view LBM as a generalized GA. By a clever choice of the 
equilibrium distributions the model becomes isotropic and Galilean invariant, 
thus solving the second problem of LGA. Finally, a very simple collision operator 
is introduced. This so-called BGK collision operator models the collisions as a 
single-time relaxation towards equilibrium, i.e. 



Eqs. (19, 26) together with a definition of the equilibrium distributions result 
in the Lattice-BGK (L-BGK) model. The L-BGK model leads to correct hydro- 
dynamic behavior. The viscosity, of a two-dimensional L-BGK on a hexagonal 
lattice is given by: 



The L-BGK is also developed for other lattices, e.g. in two or three dimensions 
cubic lattices with nearest and next nearest neighbor interactions. The LBM, and 
especially the L-BGK has found widespread use in simulations of fluid flow. 

4.5 Parallelism and Applications 

The LGA and LBM have been used to simulate many types of flow in, especially, 
complex geometries. In Section 5.1 we show in detail such an application. Here 



{■N) = T(Nf -Ni). 



(26) 




(27) 
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we will further discuss parallelism in LGA and LBM, and show some examples 
of applications of large scale parallel LGA and LBM simulations. 

The local nature of the LGA and LBM interactions allows a very straight- 
forward realization of parallelism. A simple geometric decomposition of the lat- 
tice with only local message passing between the boundaries of the different 
domains is sufficient to realize an efficient parallel simulation. For instance, we 
have developed a generic 2-dimensional LGA implementation that is suitable for 
any multi-species (thermal) LGA [16]. Parallelism was introduced by means of 
a 1-dimensional, i.e. strip-wise, decomposition of the lattice. As long as the grid 
dimension compared to the number of processors is large enough, this approach 
results in a very efficient parallel execution. This LGA system is mainly used for 
simulations in simple rectangular domains without internal obstacles. However, 
in a more general case, where the boundaries of the grid have other forms and 
internal structure (i.e. solid parts where no particles will flow) the simple strip- 
wise decomposition results in severe load imbalance. In this case, as was shown 
by Kandhai et al [30], a more advanced decomposition scheme, the Orthogonal 
Recursive Bisection (ORB) method, still leads to highly efficient parallel LBM 
simulations. ORB restores the load balancing again, however at the price of a 
somewhat more complicated communication pattern between processors. 

As in many models, the specification of initial and boundary methods turns 
out the be much more difficult than anticipated. The same is true for LGA and 
LBM. Solid walls are usually implemented using a bounce back rule (i.e. sending 
a particle back into the direction where it came from) thus implementing a no- 
slip boundary. Kandhai et al. have investigated this bounce back rule in detail for 
L-BGK [31] and conclude that this simple rule, although it may have a negative 
effect on the accuracy of the L-BGK, is a very suitable boundary condition in 
simulations, as long as one is not interested in the details of the flow close to 
the boundaries. Khandai et al. also investigated several formulations for initial 
conditions and other types of boundary conditions (e.g. specifying a certain 
velocity on a boundary). 

As an example of a large scale parallel L-BGK simulation we refer to Ref. [34] , 
where flow in a random fibrous network, as a model for paper, was simulated. The 
permeability that was obtained from the simulations was in very good agreement 
with experimental results. Another impressive example is flow in a Static Mixer 
Reactor [32]. Here, L-BGK simulations and conventional Finite Element simula- 
tions where compared, which agreed very well. The simulation results also agree 
very well with experimental results. This shows that L-BGK, which is much 
easier to parallelize and much easier to extend with more complex modeling 
compared to FE, (multi-species flow, thermal effects, reactions), is very suitable 
in real life problems involving complex flow. 
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5 Selected Applications 

5.1 Modeling Growth and Form in a Moving Fluid Using 
Synchronous Cellular Automata 

The basic idea of modeling growth and form of marine sessile suspension feeders, 
as for example sponges and stony corals [28,29], will be briefly discussed in the 
next section. The simulated growth forms will be only qualitatively discussed, 
more detailed quantitative measurements on for example the space- filling proper- 
ties, expressed in fractal dimensions, centers of gravity of the simulated objects, 
and absorption measurements are presented elsewhere [28,29]. In the model both 
the parallelism present in physical environment (dispersion of nutrients through 
diffusion and flow) and the parallelism present in the growth process, will be 
exploited. The dispersion of nutrients will be modeled using the lattice Boltz- 
mann method discussed in Section 4.4, while the growth of the stony coral will 
be modeled using a probabilistic cellular automaton. 

5.1.1 Biological Background Many marine sessile suspension feeders from 
various taxonomic groups, as for example sponges, hydrocorals, and stony corals, 
exhibit a strong morphological plasticity, which is in many cases related to the 
impact of hydrodynamics. The exposure to water movement represents one of 
the dominant environmental parameters. In a number of cases it is possible to 
arrange growth forms of sponges, hydrocorals, and stony corals along a gradi- 
ent of the amount of water movement [27]. In the examples of stony corals, the 
growth form gradually transforms from a compact shape under exposed condi- 
tions, to a thin-branching one under sheltered conditions. In Fig. 11 two extreme 
examples of growth forms are shown of Pocillopora damicornis. Form A origi- 
nates from a site sheltered to water movement, while B was collected from an 
exposed site. Between both extremes, a range of gradually changing intermediate 
growth forms exist [67]. 

Stony corals are typical modular organisms. Modular growth is defined as the 
growth of genetic identical individuals by repeated iteration of (multi-cellular) 
parts: the modules [22]. Modules might be the polyp of a coral or for instance 
a shoot with an apical meristem in seed plants. The modular growth of stony 
corals is relatively simple when compared to more complex modular organisms 
like seed plants. The modular growth of these organisms can be defined as paral- 
lel modular growth, where the various modules grow almost independently, only 
limited by steric hindrance. Because of the almost independently growing mod- 
ules, which are not limited by the development of other modules, some important 
simplifications can be made in the modeling of the growth process. The organ- 
isms can increase in size without a decrease in growth velocities, where growth of 
these organisms will be limited by external factors like strong water movements. 

To obtain insight into the influence of hydrodynamics on the growth process 
of sessile suspension feeders, a morphological simulation model was developed. 
In the absence of flow, the distribution of nutrients around the growth form 
can be modeled as a diffusion process in a steady state: there is a source of 
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A B 

Fig. 11. Growth forms of the stony coral Pocillopora damicornis. Form A origi- 
nates from a site sheltered to water water movement, form B originates from an 
exposed site 



suspended material and the organism continuously consumes nutrients from its 
environment. In general in a marine environment, there will be a significant 
contribution of the hydrodynamics to the dispersion pattern of the suspended 
material around the growth form. In this case the distribution of nutrients around 
the organism will be determined by a combination of flow and diffusion. 



5.1.2 A CA model of Flow and Nutrient Distributions The flow pat- 
tern around the simulated growth form was computed by applying the lattice 
Boltzmann method (see Section 4.4) in combination with a tracer step to study 
the dispersion of “nutrient” in the system. In these simulations the nutrient par- 
ticles are dispersed by the combined process of flow and diffusion. With this 
method simulated growth processes can be studied for various Peclet numbers 
Pe defined as: 



ul 



D 



(28) 



where u is the mean flow velocity, I a characteristic length, and D the diffusion 
coefficient of the nutrient. The contribution of flow to the nutrient distribution 
can be quantified by the Peclet number. A low value indicates that particles 
move mainly by diffusion (no influence of hydrodynamics, diffusion dominates) 
and a high value indicates that their motion is dominated by flow. 

Two types of boundary conditions are used: at the borders of the lattice 
periodic boundary conditions are applied, while in the nodes adjacent to the 
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nodes representing the simulated growth form, solid boundary conditions are 
used. Periodic boundary conditions can be implemented by exchanging the n^’s 
of the links at the borders of the lattice. Solid boundary conditions can be 
represented by exchanging the n^’s between the adjacent node and a neighboring 
fluid node. 

After a lattice Boltzmann iteration, a tracer step is applied where populations 
of tracer particles are released from source nodes, while the tracer particles are 
absorbed by the sink nodes: the nodes adjacent to the growth form. The tracer 
particles can travel from a node at site r in the lattice to one of the 18 adjacent 
nodes r + q, where the Peclet number Eq. (28) determines if flow or diffusion 
dominates. When flow dominates, most particles will move in the direction of 
the governing flow, while under diffusion dominated conditions the amount of 
particles which travels in all 18 directions will be about equal. In the simulations 
the diffusion coefficient D varies, and u is kept constant by adjusting the driving 
force F of the system. Due to the growth of the object the velocity in the free 
fluid would gradually decrease if the driving force is not adjusted. For details on 
the computational model we refer to [28]. 



5.1.3 A CA Model of the Growth Process The growth process is rep- 
resented by a probabilistic cellular automaton in a similar way as done in the 
Diffusion Limited Aggregation model [71]. In [42] and [43] it is demonstrated 
that the Diffusion Limited Aggregation growth model is P-complete for which 
fast parallel algorithms probably do not exist, the only available option to study 
this type of growth processes is through explicit simulation. The basic construc- 
tion of the aggregate is shown in Fig. 12. The cluster is initialized with a “seed” 
positioned at the bottom. In both the cluster and substrate sites solid boundary 
conditions are applied. Two flow regimes were studied in the simulations: 

1. growth of the aggregates in a mono directional flow; 

2. growth of the aggregates in a bidirectional (alternating) flow. 

The flow velocity in the system is kept at a low value, all simulations are done 
in the laminar flow regime. Tracer particles are released from the source plane. 
The tracer particles are absorbed by the fluid nodes adjacent to obstacle nodes, 
which can be nodes in the substrate plane or the aggregate nodes. In the growth 
model it is assumed that both the tracer distribution and flow velocities are in 
equilibrium and the growth velocity of the aggregate is much slower than the 
dispersion of the tracer. In the sink nodes the amount of absorbed tracer particles 
is determined and a new node is added to the aggregate. The probability p that 
/c, an element from the set of open circles o (the adjacent sink nodes) will be 
added to the set of black circles (the aggregate nodes) is given by 

p(fc e o ^ fc e •) = ^ . , (29) 

where is the absorbed amount of tracer particles at position k. 
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source plane 



substrate plane 



Fig. 12. Basic construction of the aggregate 



In the bidirectional flow simulations the flow direction is reversed after each 
growth step. The aggregation model is summarized in pseudo-code below: 



initialize aggregate 
initialize flow direction 

do { 

compute flow velocities until equilibrium; 
compute tracer distribution until equilibrium; 
compute probabilities p that nodes neighboring to the aggregate 
nodes will be added to the aggregate; 
select randomly with probability p one of the growth candidates 
and add it to the aggregate; 
for the bidirectional case: reverse flow direction; 

} until ready 

5.1.4 Parallelization of the CA models We performed the simulations on 
a lattice consisting of 144^ sites. The algorithm was implemented in parallel and 
the simulations were carried out on 16 nodes of a distributed memory Parsytec 
CC/40 system (approximately 6 Gflops/s). In this parallel implementation the 
nearest neighbor locality, present in both the lattice Boltzmann step, tracer cal- 
culation and growth model are exploited. In the parallel implementation the 
144^ lattice is decomposed into a number of sublattices which are distributed 
over the processors. The main computation steps (lattice Boltzmann and tracer 
calculation) are done in the fluid nodes, while only in the growth step some 
computation is required in the aggregate nodes. Due to the growth of the aggre- 
gate a straight forward decomposition (for example partitioning of the lattice in 
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equal sized slices or boxes) would lead to a strong load imbalance. To solve this 
problem we have tested two strategies to obtain a more equal distribution of the 
load over the processors: 



1. Box decomposition in combination with scattered decomposition. 

2. Orthogonal Recursive Bisection (ORB) in combination with scattered 
decomposition. 



In the box decomposition method the lattice is partitioned in 2D in equal 
sized boxes. In the ORB method [61] the object (the aggregate) is split into two 
subdomains perpendicular with respect to the x^-plane; after this similar splits 
are made for respectively the ^ 2 :-plane and xz-plane. This method is recursively 
repeated for each subdomain. In the scattered decomposition method [50,57] 
blocks of data are scattered over the processors. The original lattice is divided 
by using a partitioning method. These blocks are randomly scattered over the 
processors, where each processor has the same number of blocks. An example 
of a scattered decomposition over 4 processors of an irregular shaped object 
in a 2D lattice is shown in Fig. 13. In this example the lattice is divided into 
100 blocks, where each block is randomly assigned to one of the four processors. 
Most of the computation is done in the blocks containing exclusively fluid nodes. 
The scattering of the blocks over the processors leads to a spatial averaging of the 
load, where decreasing block sizes cause a better load balancing and an increasing 
communication overhead [57]. Especially in simulations in which the shape of the 
object cannot be predicted, is scattered decomposition an attractive option to 
improve the load balance in parallel simulations [42,43]. We have compared both 
decomposition strategies by computing the Load balancing efficiency: 



Load balancing efficiency 



"min 



'max 



(30) 



where Imin is the load of the slowest process and Imax the load of the fastest 
process. The two decomposition strategies were tested by using two extreme 
morphologies of the aggregates: a very compact shape and a dendritic shaped 
aggregate and by measuring the load balancing efficiency during the lattice Boltz- 
mann and tracer computation required in one growth step. 



5.1.5 Comparison of the Load Balancing Strategies In Fig. 14 the load 
balancing efficiencies (see Eq. 30) are shown for the two load balancing strategies: 
box decomposition in combination with scattered decomposition, and ORB in 
combination with scattered decomposition. In this comparison the two strategies 
were tested for both extreme morphologies of the aggregate: a compact shaped 
aggregate shown in Fig. 15C and a thin-branching (dendritic shaped) object 
depicted in Fig. 15A. 



5.1.6 Simulated Growth Forms in a Mono Directional Flow Regime 

In the simulations with a mono directional flow regime it was found that the 
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Fig. 13. Decomposition of an irregular shaped object in a 2D lattice. In this 
case 100 blocks are scattered over 4 processors 



Box scattered with compact obstacle 

ORB scattered with compact obstacle 

Box scattered with dendritic obstacle " 

ORB scattered with dendritic obstacle 




0 I I I I I I 

0 20 40 60 80 100 120 140 

Number of blocks 



Fig. 14. The load balancing efficiencies of 4 different experiments as a function 
of the total number of blocks. The dendritic object is shown in Fig. 15A and the 
compact object is depicted in Fig. 15C 



aggregate gradually changes from a thin-branching morphology (diffusion domi- 
nates) into a compact shape (flow dominates), for an increasing Peclet number. 
In Fig. 15 the results of the simulations are summarized by showing slices through 
the aggregate. The simulation box is sectioned parallel to the direction of the 
flow. In the sequence A-C in Fig. 15, the Pe number increases. In this picture it 
can be observed that the degree of compactness increases for larger Pe numbers. 
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Furthermore, the effect of the mono-directional flow can be clearly observed: the 
aggregates tend to grow towards the direction of the flow (the flow is directed 
from the left to the right). 



Fig. 15. Slices through the middle of the aggregates in the x^-plane, one- 
directional flow experiment: A-C, Peclet number increases from approximately 
0.0 to 3.0. The flow is directed from the left to the right 



5.1.7 Simulated Growth Forms in a Bidirectional (Alternating) Flow 
Regime In the previous section it is assumed that the growth form develops 
under mono-directional flow conditions. As a consequence an asymmetric form 
develops, as shown in Fig. 15, where the aggregates tend to grow in the upstream 
direction. This trend becomes stronger for higher Pe numbers. In reality, the flow 
direction will basically reverse twice a day due to the tidal movements. A better 
approximation is to use a bidirectional flow system by using an aggregation 
model in which the flow direction is reversed after each growth step [29] . 

The morphology of the aggregates, in the bidirectional flow experiment, is 
depicted in Fig. 16. 

In Figs. 17 and 18 slices through the simulation box in the xz-plane are shown 
in which the nutrient distribution is visualized, for respectively the Pe numbers 
3.000 (flow dominates) and 0.015 (diffusion dominates). The color shift black- 
white in these pictures indicate a depletion of nutrients, black indicates the max- 
imum concentration of nutrient, while the regions with nearly zero concentration 
are shown in white. 




A 



B 



C 
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Fig. 16. Slices through the middle of the aggregates in the x^-plane, alternating 
flow experiment: A-C, Peclet number increases from approximately 0.0 to 3.0 




Fig. 17. Slice through the simulation box in the xz-plane showing the nutrient 
distribution in two successive growth stages in the alternating flow experiment 
in which Pe is set to the value 3.000 (flow dominates) in the flow is directed 
from top to bottom in A and directed from bottom to top in B 



5.1.8 Discussion 

Parallelization aspects When comparing the load balancing efficiencies (Eq. 30) 
for the compact and dendritic objects in Fig. 14, comparable results are obtained 
for the dendritic object with both decomposition strategies. For the compact 
object the best load balancing efficiencies are obtained with the boxed decom- 
position in combination with scattered decomposition method. The main expla- 
nation for the difference, for both strategies, between the compact and the den- 
dritic object is that in the last case the object is already dispersed over space. 
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Fig. 18. Slice through the simulation box in the xz-plane showing the nutrient 
distribution in two successive growth stages in the alternating flow experiment 
in which Pe is set to the value 0.015 (diffusion dominates) in flow is directed 
from top to bottom in A and directed from bottom to top in B 



This property, the degree of space- filling of the object, can be quantified using 
the fractal box dimension Di,ox of the object [28]. The fractal dimension D\)ox of 
the surface of the object can be determined using a 3D version of the (cube)box- 
counting method described by Feder [18]. In three dimensions its value varies 
from a minimum of 2, for a solid object with a perfectly smooth surface, to a 
maximum of 3 for a solid with a space-filling surface. The Di^ox of the compact 
object was approximately 2.0, while in the dendritic case a value of approxi- 
mately 2.3 was found. A major disadvantage of both strategies is that, although 
the load balancing efficiencies increase with the number of blocks used in the 
scattered decomposition, the communication overhead increases also. The results 
in Fig. 14 show that the morphology of the object strongly influences the degree 
of improvement introduced by increasing the number of scattered blocks. 



Biological aspects The nutrient distributions shown in Figs. 17 and 18 demon- 
strate the main differences between diffusion and flow dominated regimes. For 
low Peclet numbers the distribution of nutrient is roughly symmetric about the 
center of the aggregate, where the highest concentration reside at the tips of 
the aggregate and where between the branches an area depleted from nutrients 
is found with a very low growth probability. At higher Peclet numbers a clear 
asymmetry develops in the distribution with a depleted region developing down- 
stream of the object (see Fig. 17). As a consequence, there is a low probability 
of growth in the depleted region. A gradual increase of compactness is demon- 
strated in Fig. 16 for an increasing influence of hydrodynamics. This gradual 
increase of compactness corresponds qualitatively to the observations made in 
stony corals, hydrocorals, and sponges, where growth forms gradually transform 
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from a compact shape, under conditions exposed to water movement, into a thin- 
branching one under sheltered conditions [25,27,67]. When comparing the slices 
through the aggregates shown in Figs. 15 and 16 it can also be observed that the 
increasing degree of asymmetry in the aggregate in the mono-directional flow 
experiment, for increasing Pe numbers, has disappeared in the alternating flow 
experiment. In the last experiments, aggregates have developed with a roughly 
radiate symmetry, which corresponds qualitatively to the shape of branching ses- 
sile organisms, as for example Pocillopora damicornis. These experiments seem 
to indicate that an alternating flow, a reversal of the flow direction basically 
twice a day, leads to radiate symmetrical growth forms. 

The alternating flow model is a strong simplification of the actual growth 
process. In many stony corals, as for example the species Pocillopora damicornis 
photosynthesis represents a major energy input. The actual growth process in 
many sponges, stony-corals, and hydrocorals [25] consists of adding layers of 
material (varying in thickness) on top of the preceding growth stage, and not by 
the addition of particles. An accretive growth model, in which layers of material 
are constructed on top of the previous layers and where the local thickness is 
determined by the local amount of absorbed nutrients or light intensity [25], also 
offers possibilities for a quantitative morphological comparison of simulated and 
actual growth forms [2,26]. 



5.2 Ising Spin Model Using Asynchronous Cellular Automata 

The Ising spin model is a model of a system of interacting variables in statisti- 
cal physics. The model was proposed by Wilhelm Lenz and investigated by his 
graduate student, Ernst Ising, to study the phase transition from a paramag- 
net to a ferromagnet [6]. A variant of the Ising spin model that incorporates 
the time evolution of the physical system is a prototypical example how Asyn- 
chronous Cellular Automata can be used to simulate asynchronous temporal 
behavior. The resulting AC A model is executed using the Time Warp optimistic 
simulation method (see Section 3). 

A key ingredient in the theory of magnetism is the electron’s spin and the 
associated magnetic moment. Ferromagnetism arises when a collection of such 
spins conspire so that all of their magnetic moments align in the same direc- 
tion, yielding a total magnetic moment that is macroscopic in size. As we are 
interested how macroscopic ferromagnetism arises, we need to understand how 
the microscopic interaction between spins gives rise to this overall alignment. 
Furthermore, we would like to study how the magnetic properties depend on 
temperature, as systems generally loose their magnetism at high temperatures. 



5.2.1 The Ising Spin Model To introduce the Ising model, consider a lattice 
containing N sites and assume that each lattice site i has associated with it a 
number where 5^ = -f-I for an “up” spin and 5^ = — I for a “down” spin. 
A particular configuration or microstate of the lattice is specified by the set of 
variables {si, S 2 , . . . , sat} for all lattice sites (see Fig. 19). 
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Fig. 19. Schematic spin model for a ferromagnet 



The macroscopic properties of a system are determined by the nature of the 
accessible microstates. Hence, it is necessary to know the dependence of the 
energy on the configuration of spins. The total energy of the Ising spin model is 
given by 



N N 

E = —J ^ SiSj — , (31) 

i,j=nn(i) i=l 

where = ±1, J is the measure of the strength of the interaction between spins, 
and the first sum is over all pairs of spins that are nearest neighbors (see Fig. 20). 
The second term in Eq. 31 is the energy of interaction of the magnetic moment, 
/iO: with an external magnetic field, H. 





E=-J 



E=+J 



Fig. 20. The interaction energy between nearest neighbor spins in the absence 
of an external magnetic field 



If J > 0, then the states jj and || are energetically favored in comparison to 
the states 1 1 and ||. Hence for J > 0, we expect that the state of the lowest total 
energy is ferromagnetic, i.e., the spins all point to the same direction. If J < 0, 
the states and are favored and the state of the lowest energy is expected 
to be paramagnetic, i.e., alternate spins are aligned. If we add a magnetic field 
to the system, the spins will tend to orient themselves parallel to since this 
lowers the energy. 

The average of the physical quantities in the system, such as energy E or 
magnetization M, can be computed in two ways. The time average of physical 
quantities are measured over a time interval sufficiently long to allow the system 
to sample a large number of microstates. Although time average is conceptually 
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simple, it is convenient to formulate statistical averages at a given instant of time. 
In this interpretation, all realizable system configurations describe an ensemble 
of identical systems. Then the ensemble average of the mean energy E is given by 

m 

{E)=Y,EsPs, 

S=1 

where Pg is the probability to find the system in microstate s, and m is the 
number of microstates. 

Another physical quantity of interest is the magnetization of the system. The 
total magnetization M for a system of N spins is given by 

N 

M = Y,Si. 

i=l 

In our study of the Ising spin system, we are interested in the equilibrium quan- 
tity (M), i.e., the ensemble average of the mean magnetization M. 

Besides the mean energy, another thermal quantity of interest is specific heat 
or heat capacity C. The heat capacity C can be determined by the statistical 
fluctuation of the total energy in the ensemble: 

And in analogy to the heat capacity, the magnetic susceptibility y is related to 
the fluctuations of the magnetization: 

X = ^ {(M^) - (Mf) . 

For the Ising model the dependence of the energy on the spin configuration 
(Eq. 31) is not sufficient to determine the time-dependent properties of the sys- 
tem. That is, the relation Eq. 31 does not tell us how the system changes from 
one spin configuration to another, therefore we have to introduce the dynamics 
separately. 

5.2.2 The Dynamics in the Ising Spin Model Physical systems are gen- 
erally not isolated, but are part of a larger environment. In this respect, the 
systems exchange energy with their environment. As the system is relatively 
small compared to the environment, any change in the energy of the smaller 
system does not have an effect on the temperature of the environment. The 
environment acts as a heat reservoir or heat bath at a fixed temperature T. 
Erom the perspective of the small system under study, it is placed in a heat bath 
and it reaches thermal equilibrium by exchanging energy with the environment 
until the system attains the temperature of the bath. 

A fundamental result from statistical mechanics is that for a system in equi- 
librium with a heat bath, the probability of finding the system in a particular 
microstate is proportional to the Boltzmann distribution [55] 
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where (3 = l/ksT^ ks is Boltzmann’s constant, Eg is the energy of microstate 
s, and Ps is the probability of finding the system in microstate s. 



The Metropolis Algorithm To introduce the dynamics that describes how the 
system changes from one configuration to another, we need an efficient method 
to obtain a representative sample of the total number of microstates, while the 
temperature T of the system is fixed. The determination of the equilibrium quan- 
tities is time independent, that is the computation of these quantities does not 
depend on simulation time. As a result, we can apply Monte Carlo simulation 
methods to solve the dynamics of the system. The well-known Metropolis algo- 
rithm uses the Boltzmann distribution to effectively explore the set of possible 
configurations at a fixed temperature T [4]. The Metropolis algorithm samples 
a representative set of microstates by using an importance sampling method to 
generate microstates according a probability function 






e-hEs 



This choice of tt^ implies that the ensemble average for the mean energy and 
mean magnetization can be written as 



Hi 

(E) = — V E, and (M) 
m 

S = 1 



1 

m 



S = 1 



The resulting Metropolis algorithm samples the microstates according to the 
Boltzmann probability. First, the algorithm makes a random trial change (a spin 
flip) in the microstate. Then the energy difference AE is computed. The trial is 
accepted with probability (note that for AE < 0 the probability is equal 

to or larger than one and the trial is always accepted). After the trial, accepted 
or not accepted, the physical quantities are determined, and the next iteration 
of the Metropolis algorithm can be started. 

The number of Monte Carlo steps per particle plays an important role in 
Monte Carlo simulations. On the average, the simulation attempts to change 
the state of each particle once during each Monte Carlo step per particle. We 
will refer to the number of Monte Carlo steps per particle as the “time,” even 
though this time has no obvious direct relation to physical time. We can view 
each Monte Carlo time step as one interaction with the heat bath. The effect of 
this interaction varies according to the temperature T, since T enters through 
the Boltzmann probability for flipping a spin. 

The temperature dependency of the physical quantities (M) and C are shown 
in figures Fig. 21(a) and Fig. 21(b). For temperature T = 0, we know that the 
spins are perfectly aligned in either direction, thus the mean magnetization per 
spin is ±1. As T increases, we see in Fig. 21(a) that (M) decreases continuously 
until T = Tc, at which (M) drops to 0. This Tc is known as the critical temper- 
ature and separates the ferromagnetic phase T < Tc from the the paramagnetic 
phase T > Tc. The singularity associated with the critical temperature Tc is 
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also apparent in Fig. 21(b). The heat capacity at the transition is related with 
the large fluctuations found near the critical temperature. The peak becomes 
sharper for larger systems but does not diverge because the lattice has finite 
sizes (singularities are only found in an infinite system). 




(a) Temperature dependence of the 
mean magnetization per spin for 
lattice size 32 x 32, 64 x 64, and 
128 X 128 




(b) Temperature dependence of 
the specific heat for lattice size 
32 X 32, 64 X 64, and 128 x 128 



Fig. 21. Temperature dependency of mean magnetization and specific heat 



5.2.3 Continuous- Time Ising Spin System The standard Ising spin model 
represents a certain discrete-time model, as Monte Carlo steps are regarded to 
be time steps. However, the transient evolution of the Ising spin configurations 
is considered an artifact. Glauber [20] introduced continuous-time probabilistic 
dynamics for the Ising system to represent the time evolution of the physical 
system. 

The Ising spin model with continuous-time probabilistic dynamics cannot 
be solved by Monte Carlo simulation, since time has no explicit implication on 
the evolution of the system in the Monte Carlo execution model. To capture the 
asynchronous continuous-time dynamics correctly, the problem is mapped to the 
ACA model and is executed by event-driven simulation. 

In the continuous-time Ising spin model, a spin is allowed to change the state, 
a so-called spin flip, at random times. The attempted state change arrivals for a 
particular spin form a Poisson process. The Poisson arrival processes for different 
spins are independent, however, the arrival rate is the same for each spin. Similar 
to the Monte Carlo simulation, the attempted spin flip, or trial, is realized by 
calculating the energy difference AE between the new configuration and the old 
configuration. The spin flip is accepted with the Boltzmann probability . 

The discrete-time and continuous-time models are similar. They have the 
same distribution of the physical equilibrium quantities and both produce the 
same random sequences of configurations. The difference between the two models 
is the time scale at which the configurations are produced: in discrete-time, the 
time interval between trials is equal, and in continuous-time, the time intervals 
are random exponentially distributed. 
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5.2.4 Optimistic Simulation of the Parallel ACA Model The resulting 
continuous-time Ising spin model is parallelized by geometric decomposition. The 
Ising spin lattice is partitioned into sub-lattices, and the sub-lattices are mapped 
onto parallel processors. To minimize the communication between sub-lattices, 
local copies of the neighbor boundaries are stored locally (see Fig. 22). By main- 
taining local copies of neighbor boundaries, spin values are only communicated 
when they are actually changed, rather than when they are only referenced. 
A spin flip along the boundary is communicated to the neighbors by an event 
message. The causal order of the event messages, and thus the spin updates, are 
guaranteed by the optimistic simulation mechanism. 




Fig. 22. Spatial decomposition of the Ising spin lattice. The grey areas are local 
copies of neighbor boundary strips. For example, processor PE 2 has a local copy 
of spin “a” owned by processor PE 1. Processors PE 2 and PE 3 both own a 
copy of spin “c” . The arrows in the figure indicate the event messages sent upon 
a spin flip 



Asynchronous Cellular Automata, and thus also the Ising spin model, put 
additional requirements on the original formulation of the Time Warp method. 
Eor example, the Time Warp method, as all optimistic PDES methods, must save 
its state vector each time an event is executed. The state vector of an spatial 
decomposed ACA can be arbitrarily large, that is, all the cells in the sub-lattice 
are part of the state vector. Eor efficient memory management, we incorporate 
incremental state saving in the Time Warp method [51]. Incremental state saving 
stores not the full state vector, but saves only the changes to the state vector 
due to the execution of an event, which is only a small fraction of the full state. 
Besides efficient memory management, incremental state saving also reduces the 
time overhead related to the memory copy. 

With incremental state saving, no full copy of a state vector at a certain 
simulation time exists in the simulation execution environment. Instead, upon 
a rollback of a series of events, the state vector is reconstructed by processing 
the event-partial state collection in reverse order. Although incremental state 
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saving requires less state saving time and memory, there is an increased cost of 
state reconstruction. In general, the number of rolled back events is a fraction of 
the number of events executed during forward simulation. The fraction of rolled 
back events and the time overhead difference between state saving is an order 
of 10 bytes versus an order of 10^ bytes, therefore incremental state saving is 
favorable in spatial decomposed AC A applications. 

Parallel Performance Results To validate the efficacy of the optimistic Time 
Warp simulation method, we have designed and implemented the continuous- 
time Ising spin model to study the parallel scalability behavior of the system. 
The experiments with the Ising spin model were performed on the Distributed 
ASCI Supercomputer (DAS) [15]. The DAS consists of four wide-area distributed 
clusters of total 200 Pentium Pro nodes. ATM is used to realize the wide-area 
interconnection between the clusters, while the Pentium Pro nodes within a 
cluster are connected with Myrinet system area network technology. 

To determine the speedup and relative efficiency of the parallel Ising spin 
implementation, the execution time of the parallel simulation on one processor is 
compared with the execution time on different number of processors. Figure 23(a) 
shows the relation between speedup and the number of processors for a fixed 
problem size. Together with the results from Fig. 23(b), we can see that the 
parallel Ising spin for T = 2.0 scales almost linearly up to 6 processors, but 
eventually drops to a relative efficiency of 0.83 for 8 processors. For temperature 
T = 3.0 the relative efficiency decreases gradually to 0.68 for 8 processors. 

The decreasing efficiency is mainly due to the increased costs to synchronize 
the parallel processes. The difference in parallel performance for different tem- 
peratures T can be explained by the measure of dynamic behavior that depends 
on the temperature of the system. According to the Boltzmann acceptance prob- 
ability ^ more trails are accepted as the temperature increases. If there 

are more changes in the system, relatively more synchronization messages must 
be sent between the sub-lattices, which affects the performance negatively. With 
the increase of the number of processors, the time overhead to synchronize the 
parallel simulation processes increases even more as there are more parallel pro- 
cesses that have to find their mutual synchronization point in time. 

The parallel ACA model with the Time Warp execution mechanism is an 
effective solver for continuous-time Ising spin systems. The microscopic ACA 
rules defining the spin fiip probabilities describe the macroscopic magnetization 
behavior of the Ising spin system (see Fig. 21(a) and Fig. 21(b)). The optimistic 
simulation method scales reasonably well with the number of parallel processors, 
although precautions have to be taken. The required overhead time to synchro- 
nize the simulation increases with the number of processors. The synchronization 
overhead can be reduced by limiting the optimism of the Time Warp mecha- 
nism. The optimism control effectively bounds the time retardation between the 
parallel processors such that synchronization between the processors is faster 
accomplished. 



Distributed Simulation with Cellular Automata 



243 




1.0n 



0 . 8 - 

0 . 6 - 

0.4- 

0 . 2 - 




— T = 2.0 
T = 2.4 
T = 3.0 



0.0 ^ ^ ^ ^ ^ ^ ^ ^ 1 

0 2 4 6 8 

processors 



(a) Speedup of parallel Ising spin 
for lattice size 128 x 128 



(b) Relative efficiency of parallel 
Ising spin for 1 attice size 128 x 128 



Fig. 23. Parallel performance of the Ising spin simulation 



6 Summary and Discussion 

A common denominator in the next generation scientific computing applications 
is the understanding of multi-scale phenomena, where systems are studied over 
large temporal and spatial scales. The simulation of these natural phenomena 
requires an ever increasing computer performance. Although computer perfor- 
mance as such still doubles approximately every year, it is our strong belief that 
the development of algorithms for modern computer architectures stays behind. 
One of the biggest challenges is, therefore, to develop completely new algorith- 
mic approaches that support efficient modeling of natural phenomena and -at 
the same time - support efficient distributed simulation. Hence we need to boost 
the computational power instead of the computer power. 

One way to approach this is to look closely to the way nature itself per- 
forms computation. This is largely an unexplored research field. In this paper 
we discussed the concept of interacting virtual particles whose dynamics give 
rise to complex behavior, by using Cellular Automata as a compute metaphor 
for modeling and distributed simulation. 

As specific instances of Cellular Automata we described the concepts and use 
of parallel Lattice Gas Automata and the Lattice Boltzmann model. Although 
these models have been around for a decade, they were mainly studied from a 
theoretical physics point of view. Our interest is to study them from a compu- 
tational science point of view, to apply them to real-life natural phenomena and 
to compare them with real experiments. We are on the front of the second wave 
of interest in discrete particle models, where the computational aspects and the 
modeling abilities are the main research questions. 

In addition we described a new approach to efficient distributed execution of 
asynchronous cellular automata through discrete event execution and apply this 
to different biological and physical models. 



244 



P. M. A. Sloot et al. 



In the near future we will setup an international collaboration to use the 
developed models and concepts in the exploration of various challenging prob- 
lems stemming from biology, ranging from tumor growth models to population 
dynamics. Population dynamics models for instance, can be used to understand 
fluctuations in natural populations, and are fundamental in fishery, ecological 
research and management of nature reserves. A well known example of a popula- 
tion dynamics model are the Lotka-Volterra equations, first proposed by Volterra 
to explain the oscillatory levels of certain fish catches in the Adriatic sea. 

The inability of Lotka-Volterra models to capture the individual stochastic 
interaction, has motivated the application of Cellular Automata as an alterna- 
tive modeling paradigm [45,69]. In the CA model, the populations of preys and 
predators are no longer considered as homogeneous collections of individuals with 
identical average properties. CA models form the basis for population dynam- 
ics models based on discrete individuals, where the behavior of the individuals 
is formulated by microscopic rules. An additional advantage is that individual 
processes, such as movement in space, growth, reproduction, behavioral and eco- 
logical interaction, can be represented explicitly. 

A synchronous update scheme for the CA model is not realistic from a biolog- 
ical point of view: it is not likely that groups of individuals move simultaneously 
at the exact same time through space. As each individual behaves indepen- 
dently from the others, both in time and space, an asynchronous update scheme 
is required. The ACA model and the resulting event-driven simulation associates 
a simulation time with each update, thus enabling a more meaningful interpre- 
tation to the time evolution of individual based models. 
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Abstract. The Group-By operation is widely used in relational Data 
Warehouses for the aggregation and presentation of results according 
to user-defined criteria. Rearranging Group-By and join operations has 
been proposed as an optimization technique that reduces the size of 
the input relation. Pipelining is another optimization technique that 
promotes intra-query parallelism by transfering intermediate results to 
the next operation without materializing them. In Data Warehouses 
and environments in which exploratory on-line queries are common, 
pipelining can be used both for optimizing joins and for the reduc- 
tion of response time by presenting partial results to the user. Efficient 
pipelining often depends on the order of the intermediate results. Order- 
dependent operations, such as Group-By, typically require the complete 
result set before ordering it, thereby making efficient pipelining impos- 
sible. In this paper we exploit bitmap indexing to implement Group-By 
in a manner that can be used for pipelining. Algorithms are presented 
for different assumptions about buffer availability and for different sort 
orders. 



1 Introduction 

Data Warehouses integrate the data from on-line transaction processing systems 
and a variety of other internal and external sources to provide historical data 
that can be used in support of the decision making process. Data Warehouses 
must be capable of responding, in addition to predefined report-type queries, to 
interactive exploratory queries. In the interactive mode the perceived response 
time, i.e. the time between issuing a query and the appearence of the first results, 
is critical. 

The size of typical Data Warehouses, their peculiar design in form of variants 
of the star-schema, and the fact that data access is primarily for reading purposes 
have an impact on the storage structures that are used, the indexing methods 
or secondary access paths, the implementation of the operators of the relational 
algebra and the query processing strategies followed. 

The design of a Data Warehouse is often based on a star schema. In this design 
the aggregatable data from a business transaction, for example, the number of 
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units sold and the value of the sale, are placed in a fact table that is linked 
through a foreign key relationship with the dimension tables that contain all 
the descriptive information. As a rule of thumb, fact tables contain many small 
records but the historical nature of a Data Warehouse causes the fact tables 
to be extremely large. The dimension tables typically consist of relatively few 
but long records with many descriptive fields. Dimension tables are in general 
disjoint and have no attributes in common. 

The disjoint nature of the dimension tables has an important effect on the 
processing of joins: any pairwise join involves the large fact table. To be efficient, 
join algorithms must be developed that minimize the access to the fact table, 
ideally reducing it to a single scan or even less through the use of precomputed 
join indexes, or by exploiting clustering and the corresponding metadata. In 
addition, the writing of intermediate results, which may in turn be rather large, 
should be avoided whenever possible. This fact and the need to present (partial) 
results to the user or the next processing step as soon as possible make pipelined 
joins desirable. 

Another key operation in Data Warehousing is the Group-By. The Group- 
By operation applies one of the standard SQL aggregation operators (SUM, 
COUNT, MAX, MIN, AVG) to an aggregatable attribute and groups the result 
according to the specified grouping attribute. For example, car sales can be 
summed up and grouped by model. Group-Bys are often applied as the last 
operation before the ordered results are presented to the user. Previous research 
has shown the advantage of pushing Group-Bys past joins to reduce the size 
of the input relations [9,3]. We contend that Group-Bys are prime candidates 
for pipelining, thus allowing either the user or the next operation to use partial 
results as they become available. 

In [5] sampling is used to produce approximate results for online aggregation. 
While a statistical approach with gradual refinement may be enough to satisfy 
user requirements in some decision support environments, it is not adequate 
as input to other operations. In this paper we show how bitmap indexes can 
be exploited to produce online exact results of the Group-By operation. These 
results are produced in the proper sort order and can be used both to reduce 
the turnaround time and to optimize the execution time through pipelining. 

Bitmap indexes are popular secondary access methods for Data Ware- 
houses [6, 1,7, 2, 8]. Bitmaps are compact data structures that have a low pro- 
cessing cost and are easy to maintain. In contrast to id-list indexes, such as 
B- trees, the information of key values and their tuple-ids are encapsulated in 
bits and bit positions. The operations on bitmaps are inexpensive bitwise oper- 
ations. Bitmaps offer the possibility of evaluating selection predicates directly 
on the index and of combining different selection criteria through logical oper- 
ations on the index without having to access the underlying data. Using id-list 
indexes, compound selection predicates can either be evaluated by choosing the 
most selective index to filter the operand table followed by a sequential scan on 
the temporary table and the evaluation of the rest of the predicates, or directly 
by a multiple index scan, which uses all the available indexes to retrieve sets of 
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tuple-ids, followed by set operations on them. Both of these methods are more 
expensive than the use of bitmaps. 

Query processing strategies depend heavily on the available access paths 
and on the implementation of the operators. Since the availability of bitmaps 
greatly impacts the query processing strategies, the query processor should be 
bitmap-enabled, for example, by processing queries on the index without retriev- 
ing the actual data. One crucial advantage derived from the use of bitmaps is 
the fact that the intermediate results derived from index processing are them- 
selves bitmap indexes that can be further exploited. So far, efforts reported in 
the literature have concentrated on the development of new variants of bitmaps 
and their use and effectiveness in processing individual operations, mostly selec- 
tions. To take full advantage of the new index structures and the algorithms 
that use them for individual operations, query processors/optimizers must be 
bitmap-enabled to exploit the new indexes for the global query optimization 
process. They should also include optimization criteria that favor pipelining and 
the early presentation of results to the user thus reducing the perceived response 
time. 

The remainder of this paper briefly reviews bitmap indexing in Section 2 
and pipelining in Section 3; Section 4 discusses the optimization of Group-Bys; 
Section 5 shows how the Group-By operation can be supported through the use 
of bitmap indexes; and Section 6 discusses future work and provides conclusions. 



2 Bitmap Indexing Revisited 

Bitmap indexes of different structure have been proposed as secondary access 
methods for Data Warehouses [6, 1,7, 2, 8]. In principle, bitmap indexes encapsu- 
late the information of attribute values and tuple-ids in bits and bit positions. 
In their simplest form, bitmap indexes store a bit vector for each value of the 
domain of an attribute. The length of a bit vector is equal to the cardinality of 
the indexed table. A “1” is stored in the vector at the position that corresponds 
to the position of the tuple if the attribute of that tuple has the value of the cor- 
responding bit vector. Otherwise a “0” is stored. Simple bitmap indexing works 
well for small domains but the vectors become rather sparse in large domains. 
To solve this problem the bit vectors can either be compressed, resulting in a 
large number of short and dense vectors, or they can be encoded, resulting in a 
small (logarithmic) number of regular length dense vectors [7]. Encoded Bitmap 
Indexing preserves not only the positioning of the bits but also the simplicity of 
application. 

An encoded bitmap index defines first an encoding function that is reflected in 
a mapping table and then defines the corresponding Retrieval Boolean Function 
for each attribute value. For example, given an attribute A with the domain 
{a, 6, c, d, e, /, t, u, v, w}, an Encoded Bitmap Index on A consists of a mapping 
table, a set of Boolean retrieval functions and a set of bitmaps, as shown in 
Figure I. 
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(a) Mapping table 



fvM =b3b2bibo /c=b3b2bibo ft =b3b2bibo 
/NULL=b3b2bibo /d=b3b2bibo /u=b3b2bibo 

fa =b3b2bibo /e=b3b2bibo /v =b3b2bibo 

fb =b3b2bibo //=b3b2bibo /tt;=b3b2bibo 
(b) Retrieval min-terms 



Fig. 1. An Example of Encoded Bitmap Indexing on A 



The bitmaps, b 3 ,b 2 ,bi,bo, are set according to the encoded values of the 
indexed attribute, e.g., for those tuples with A = a, their corresponding bits 
in bs,b 2 ,bo are cleared, and the bits in bi are set to 1. The retrieval Boolean 
functions are also defined based on the encoded values, e.g., the retrieval Boolean 
function for the value ”a”, denoted by /a, is defined by b 3 b 2 bibo, where a ’0’ in 
the encoded value is expressed by the negation of the Boolean variable. Using the 
Encoded Bitmap Index on A to evaluate A G {a,6, c, d}, the retrieval Boolean 
functions of the selected values are selected and composed to form a logical 
disjunction, i.e., /a + A + /c + /d, which is further reduced to b 3 bi. The I’s bits 
in the resulting bitmaps indicate those tuples satisfying the selection condition. 

There are other variations of bitmap indexing, such as bit-sliced indexes 
[6,1,2]. In spite of many variations of bitmaps, there are two points in common: 
1) the design principle is to reduce the space requirement of the indexes while 
preserving the performance; and 2) the indexes are complete^ i.e., for selections of 
any given value a single bitmap can be generated, and those and only those “1” 
bits indicate the desired tuples. With these properties, our methods discussed 
later can be applied to other variation of bitmap indexes, not just Encoded 
Bitmap Indexes. 

The bitmaps we have described so far are all tuple-lev el That means, 

one bit is used to represent one tuple in the base table. In the processing of 
Group-Bys and other operations, such as Joins, page-level bitmaps are used, 
because of the nature of block I/Os. One tuple-level bitmap is uniquely mapped 
to one page-level bitmap. The page-level bitmap is constructed as follows. Sup- 
pose that we have a bitmap b, where |b| denotes the length of b in bits, and 
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w is the blocking factor of a physical page, i.e., the number of tuples in a page. 
Then, the page- level bitmap of b, denoted by bp, consists of bits. The i-th. 
bit in bp is set, if any bit between the (Tre)-th and the ((i + l)-re — l)-th bits 
in b is set, where 0 < i In other words, the Tth bit in bp is set, if any 

tuple in the Tth page is selected based on b. 

3 Pipelining 

When the cost of an operation is evaluated, a significant portion of that cost is 
due to the materialization and the necessary writing of the result, particularly in 
situations in which the materialized intermediate result cannot be kept in main 
memory and must be written to disk. A convenient way to reduce this cost is 
through pipelining. 

When multiple operations must be performed in sequence it is more efficient 
to pass on the intermediate result to the next operation directly, thereby avoiding 
the cost of writing for the first operation and the cost of reading for the next 
operation. For example, if a join of three tables Tl, T2, T3 must be performed, 
it can be performed by joining first Tl with T2 and then joining the resulting 
table T4 with T3. Instead of waiting until the whole join is performed and 
writing the intermediate result, the query processor can initiate the second join 
with the tuples of the result from the first join as these become available. If 
the results are produced in the right order, an inexpensive merge join can be 
performed. However, if the intermediate result is not in the right sort order a 
more expensive join implementation must be chosen or the intermediate result 
must first be materialized and sorted. 

In Data Warehousing the processing of the fact tables must be optimized. 
Fortunately, those joins are between the partial keys of the records in the fact 
table which are the keys of the dimensions. The load process of the Data Ware- 
house typically produces good clustering that can be exploited for pipelining. 
For example, data corresponding to all the transactions of the day at a given 
store are loaded overnight, and result in a clustering by date and store. A coarse 
page level bitmap index or a hierarchically encoded bitmap index can reflect 
these facts and can be used to support pipelined joins. Pipelining can improve 
intra-query parallelism drastically in the typical multiprocessor environments. 

In environments in which exploratory queries are common, a user often issues 
a query online and waits for its result. Pipelining has two advantages in such an 
environment: a) the response time is reduced for the user, since partial results can 
be displayed as they become available, and b) substantial savings can be realized 
if the user decides to terminate the query after inspecting the first results. 

However, the user typically expects the data to be grouped and sorted by 
some criterion before it is presented. When the sort order that is expected by 
the user does not correspond to the order in which the results are generated, the 
result must first be completely computed and later sorted. Therefore, implemen- 
tations of the Group-By operation that produce the results incrementally in the 
right order and can be pipelined, offer a large potential for query optimization. 
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4 Optimization of Group-By 

A generic Data Warehouse query has the following form: 



SQL 1 



Select 


non-aggregate attributes 




aggregation(aggregatable attribute) 


From 


dimension tables 




fact table 


Where 


join conditions 




restriction conditions 



Group-By non-aggregate attributes 
Order-By sort attributes 

Conventional optimizers are likely to produce query trees in which the restric- 
tion conditions are applied first, followed by the Joins, executing the Group-Bys 
after the joins, followed by sorting for the final presentation order. 

We will illustrate a possible optimization of the Group-By statement with a 
small example taken from [3]. 

Example 1. Given is the Data Warehouse of a factory, that models the orders of 
its dealers in a fact table Order and three dimension tables. Division, Product and 
Dealer. The factory has some sectors. A sector consists of some divisions, and a 
division produces some products. The schema is as follows. 

Division(DivlD, SectorlD) 

Product(ProdlD, Cost, DivID) 

Order(OrderlD, ProdID, DealerlD, Amount, Date) 

Dealer(DealerlD, State, Address) 



Note that this schema has been normalized. In the terminology of Data 
Warehousing this normalized schema results in a snowflake, i.e. the dimensional 
attributes that form a hierarchy must be combined again through join operations 
on the subdimensions. Therefore, in Data Warehousing it is often recommended 
that attributes that are functionally dependent on other dimensional attributes 
are placed in the same dimension table. In this case, SectorlD could have been 
placed in the Product dimension since redundance and updates are not a major 
issue in a Data Warehouse. We will illustrate the optimization first on the nor- 
malized (snowflake) example and will later show that the same optimization is 
valid for the common (non-normalized) Star schema design. 

Suppose that we want to query the sum of the orders for each sector of the 
factory. In SQL this query would be: 
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SQL 2 

Select 5?/m(Amount) 

Prom Order, Product, Division 

Where Order. ProdID = Product. ProdID And 

Product. DivID = Division. DivID 
Group-By Division. SectorlD 

The query trees of SQL 2 are depicted in Figure 2. Figure 2(a) shows the exe- 
cution tree generated by traditional query optimizers which postpone Group-Bys 
until all joins are done. In [9,3], it is proposed to process Group-Bys before joins. 
The benefit of pushing down the Group-Bys in the execution tree is the reduc- 
tion of the intermediate result which is the input to the join and in turn reduces 
the join cost. Figure 2(b) shows an execution tree resulting from pushing down 
the Group-Bys. In the query tree of Figure 2(b) the first Group-By is performed 
on the base data, while subsequent Group-Bys are performed on intermediate 
results. Since operations that are performed on the base data usually can exploit 
existing index structures while intermediate results are not indexed, there is a 
clear advantage to performing the operations on base data. 

Figure 2(c) depicts the execution tree of SQL 2 by pushing all the Group-Bys 
in Figure 2(b) down to the lowest operator level. 



sum (Amount) 
Group SectorlD 

Join 

Join Division 
Order Product 



sum (Amount) 
Group SectorlD 



Join 



sum (Amount) Division 
Group By DivID 



Join 



sum (Amount) Product 
Group By ProdID 



(a) 



Order 



(b) 





Division ; 



sum (Amount) Product,. 
Group By {ProdID} 



Order 



(c) 



Fig. 2. Execution trees of SQL 2 



Pushing the Group-Bys to the level of the base data is possible whenever 
functional dependencies exist that uniquely determine the grouping attributes 
at the higher level. This situation will always arise when snowflaking was used to 
break out hierarchies in a dimensional attribute. For example, ProdID functionally 
determines DivID (ProdID ^ DivID), which in turn functionally determines SectorlD 
(DivID ^ SectorlD). The Group-By can be pushed down if no other attributes of 
the snowfiaked dimension(s) are needed in the final result. In the example, the 
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Group-By can be executed on the corresponding subsets of Prod ID since there 
exists one partitioning on the attribute domain of Prod ID, such that it divides its 
attribute domain into several disjoint subsets, and all the values in one subset 
map to the same value of SectorlD. 

In the more common Star-schema functionally dependent attribute hierar- 
chies often are placed in the same dimension table as the dimension table’s key. 
The dimension table’s primary key is used as part of the fact table’s key. As long 
as functional dependencies exist between the dimension table’s primary key and 
the grouping attribute specified in the Group-By clause, it is possible to find 
partitions of primary keys that correspond directly to the higher-level grouping 
attribute, and the Group-By can then be executed directly on the base data. 

If the functional dependencies among attributes are modelled in the data 
catalog, then they can be directly applied to find the partitioning, in our example 
the partitioning on ProdID. Or, if an Encoded Bitmap Index (EBI) is defined 
on the products and their hierarchies, namely “sectors-divisions-products” , the 
information about the dimension hierarchies can be hashed into the mapping 
table together with Prod IDs. 

5 Execution of Group-By 

The interesting question is whether the Group-By can be executed on the base 
tuples in such a way, that the result is produced in the required order that can be 
used for pipelining. Traditionally, if the result of the Group-By operation can be 
fit into the main memory, the operation can be evaluated by sequentially scan- 
ning the operand table once and sorting the result of the aggregation acording 
to the grouping attribute. If the result of the Group-By does not fit in mem- 
ory, hashing or sorting are first applied to the operand table, followed by some 
merges of the intermediate results to produce the final result [4]. These imple- 
mentations try to optimize query execution time by minimizing the execution 
time of individual operations. 

In Data Warehousing, however, response time is often a more important 
performance metric than query processing time. Under this circumstance, the 
key question is: %ow to reschedule the page accesses such that we can minimize 
the turn-around time, and at the same time keep the query processing cost low?^^ 

In this section we present Group-By algorithms that exploit bitmap indexing. 
With the help of bitmaps, we are able to reschedule the I/O accesses to produce 
Group-By results that can be pipelined and can be used to minimize the turn- 
around time, which is defined by the time interval from query submission until 
the first result is available. 

The first algorithm, for which the pseudocode is shown in Eigure 3, requires 
sufficient buffer space to buffer all the intermediate results of the Group-By 
aggregation as well as the pertinent index structures. 

We assume that a bitmap index exists on the fact table, more precisely on 
the partial key that is the primary key of the dimension table in which the 
grouping attribute is located. Based on functional dependencies, the mapping 
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Algorithm 1 [Online Grouping by Bitmaps] 

Input: grouping attribute (s) G, whose domain is further divided into 

a sequence of grouping subsets, (^i, . . . ,gk),9i G domain{G) 
and {bi, . . . , bk} are the grouping bitmaps for {gi, . . . ,gk}, respectively 
an aggregate function, /, and the attribute to be aggregated, A 
an operand table, T 
Output: grouping results 

1) Begin 

2) define a bit vector toDo with the same length as 5^, < i < k)] 

3) let toDo = 1; 

4) define a perfect hashing function, ht, 

5) such that ht{v) = ht{v'), Vu, v' E gi (1 < i < k); 

6) define a hash table, HT, HT[i] denotes the z-th entry of HT 

7) for each bi in the sequence (5i, . . . , bk) 

8) for each j-th bit in bi, denoted by bi[j] 

9) if {bi[j] k toDo[i\) 

10) read in the j-th page of T, denoted by T[j]; 

11) for each tuple t in T[j] 

12) cumulate t.A into HT[ht{t.G)]] 

13) clear toDo[j\, 

14) output f{HT[ht{g)]), for any g G gv, 

15) End 



Fig. 3. Bitmap-based Group-By, unlimited buffer 



from the grouping attribute to sets of the dimension key (a partial key of the 
fact table) is performed. For the purpose of this algorithm we do not require 
a particular type of bitmap index, since a page- level bitmap {bi, . . . ,bk} for 
the set of grouping subsets ,^/c) can be derived, i.e. for each value in 

the domain of the grouping attribute a bit vector can be maintained in which 
a 1 is set if the page of the fact table contains a tuple that is relevant for the 
aggregation under that particular value of the grouping attribute. The Group-By 
operation is then performed in the proper sort order. In the first aggregation, 
with the help of the page level bitmap all pages are read and scanned that are 
needed for the first aggregation subset gi . The first aggregation is complete after 
this pass and can be pipelined. The partial aggregations corresponding to other 
aggregation subsets are kept in memory but the pages that have been processed 
are discarded. For each page that has been processed, the corresponding position 
is set to zero in the ToDo vector, an auxiliary bit vector of the same length as 
the page- level bitmaps. For the second aggregation subset the corresponding 
page-level bit vector is ANDed with the ToDo vector and the resulting pages are 
retrieved and processed. This time the aggregation corresponding to the second 
aggregation subset will be completed and can be pipelined. After updating the 
ToDo vector the process is repeated until all subsets are processed. Note that 
no pages are read twice and they can be discarded right away. 
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If the buffer is limited and all the values for the incomplete aggregations 
cannot be kept in main memory, the above algorithm must be modified. Fig- 
ure 4 presents the pseudocode of the modified algorithm for the case of limited 
buffer space. In this variant of the algorithm, only a portion of the hash table 
containing the partially aggregated values and a limited number of bitvectors 
can be maintained in memory. Therefore, the aggregations are processed by sub- 
sets of g. Bitvectors that correspond to the already processed values of g are 
cleared and replaced by the next group. Since not all aggregations can be per- 
formed at the same time, some tuples may have to be read more than once, i.e. 
some pages may have to be accessed repeatedly. This algorithm is suboptimal if 
the Group-By is considered in isolation but allows pipelining by producing the 
results incrementally in the right order of the grouping attribute. 



Algorithm 2 [Online Grouping by Bitmaps with Limited Buffer Size] 

Input: grouping attribute(s) G, whose domain is further divided into 

a sequence of grouping subsets, (^i, . . . ^gk)^ g% O domain{G) 
and {5i, . . . , hk} are the grouping bitmaps for {^i, . . . respectively 

an aggregate function, /, and the attribute to be aggregated, A 
an operand table, T 
a buffer of size M 

Output: grouping results 

1) Begin 

2) define a hash table, HT, HT[i] denotes the z-th entry of HT^ such that i < M; 

3) define a perfect hashing function, ht, such that ht{v) = ht{v'),Wv, v' G 5'(z%m)+i; 

4) let m = 1, n = M; 

5) for each bi in the sequence (^i, . . . ,gk) 

6) for each j-th bit in bi, denoted by bi[j] 

7) if (bi\j]) 

8) read in the j-th page of T, denoted by T[j]; 

9) for each t in T[j] 

10) cumulate t.A into HT[ht{t.G)], if t.A ^ gi^ m < I < n; 

11) clear all bi[j], Vm < I < n; 

12) output f{HT[ht{g)]), for any g e gi; 

13) let m = m + l, n = n+l; 

14) End 

Fig. 4. Bitmap-based Group-By, limited buffer 



So far we have considered only the common case where the resulting sort 
order is determined by the grouping attribute. However, from the generic query 
we saw that other sort orders may be specified through a separate Sort By 
clause. In particular we are interested in the case in which the sort order is 
based on the aggregated attribute, for example, the user wants to aggregate 
the sales of cars grouped by model but needs them sorted by total sales. By 
providing implementations that combine several operations, it becomes possible 
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to optimize the execution without having to modify the language semantics. 
Figure 5 shows the pseudocode of an algorithm that produces the result of a 
Group-By in the order of the aggregation attribute. Through the use of bitmap 
indexing and the auxiliary information that is automatically available in such 
an index, i.e. the number of tuples in each group, it becomes possible to produce 
the aggregations in the proper sort order, thus making it possible to pipeline 
partial results. 



Algorithm 3 [Online Ordering by Bitmaps] 

Input: grouping attribute (s) G, whose domain is further divided into 

a sequence of grouping subsets, (^i, . . . ,gk),9i ^ domain{G) and 

{bi ,. . .,bk} are the tuple-level grouping bitmaps for {gi ,. . .,gk}, respectively 

an aggregate function, /, and the attribute to be aggregated, A 

numbers of tuples in each group, ci, . . . , Cfc 

an operand table, T, and 

vertical bitwise partition on A, denoted by Am-i, • • • , Ao 
Output igrouping results 

1) Begin 

2) define arrays sum\\ and seq\\ of k integers; 

3) let sum[i] — bj -Am-i , z = 1, . . . ,k]llbj denotes the transpose of the bit vector bi 

4) assign seq[i], z = 1, . . . , /c, such that sum[seq[l]] < sum[seq[2]] < • • • < sum[seq[k]] 

5) repeat 

6) let done=TRUE, h— 1; 

7) for i — 2 to k 

8) if {{sum[seq[i]] - sum[seq[i - 1]]) < Cseq[i-i]) 

9) done=FALSE; 

10) break; 

11) if [\done) 

12) let /z + +; 

13) let sum[i\ = 2 x sum[i] -f bJ • i — 1, . . . , /c; 

14) assign seq[i]^ z = 1, . . . , /c, such that sum[seq[l]] < • • • < sum[seq[k\] 

15) until {done or h> m); 

16) if {C{k^-^) < C{seq[l])) 

17) let sum[i] = 2 x sum[i] -h bf • Am-j, z = 1, . . . , /c and j — .. . , m; 

18) else 

19) call OnlineGroupingByBitmaps((^ 5 eg[i],. . ^seg[fc]), {^seg[i] ,• • •, ^seq[k]})] 

20) End 



Fig. 5. Bitmap-based Group By, sorted by aggregated value 



The algorithm works as follows. In order to determine the order in the 
sequence, bitmaps, A^_i,... ,Aq, are read one at a time, until there exists a 
total ordering in {sum[seq[l]]^ • • • , sum[seq[k\\) . If the cost of using the sequence 
to calculate the first grouping result is larger than the cost of merging the rest 
(m — h + 1) bitmaps, Am-h+h • • • , into sum[i] {i = 1, . . . , /c), then the rest 
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{m — h-\-l) bitmaps, Am-h+h • • • , are used to produce the final result, else 
call Algorithm 1 to perform the grouping. 

6 Conclusions 

Bitmap indexing has rapidly become a standard feature of Data Warehouse 
platforms. The previously reported use of bitmap indexing has been mostly for 
search and select processing with particular emphasis on the optimization of 
individual operations. In this paper we have taken a broader view and have shown 
how a bitmap-enabled query processor can exploit bitmap indexes to enforce 
other optimization criteria, such as response time rather than query execution 
time. To support this view we have developed and presented algorithms that 
use bitmap indexing to implement the Group-By operator in such a way that 
results can be produced incrementally in the proper sort order. This makes the 
results available both for pipelining to other operations, such as Join, and to the 
end-user working online with the Data Warehouse in an interactive mode. 

To take full advantage of bitmap indexing, new bitmap-enabled query proces- 
sors and optimizers are needed. Future work will concentrate on the development 
of such an optimizer that integrates and exploits the benefits of bitmap indexing 
across the whole query. 
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Abstract. This talk will give an overview of an interdisciplinary 
research project being developed at The University of Memphis, led by a 
team of computer scientists, psychologists, and educators. The project’s 
goal is to research and develop prototypes for an intelligent autonomous 
software agent capable of tutoring a human user on a narrow, but fairly 
open, domain of expertise. The chosen prototype domain is computer 
literacy. The agent interacts with the user in natural language and other 
modalities. It receives input in typewritten form, possesses a good deal 
of syntactic and semantic capabilities to interpret inputs in context rele- 
vant fashion, select appropriate responses (short feedback, dialog moves), 
and completes the dialog cycle in multimodal form (feedback delivered 
in short spoken expressions and/or facial gestures, spoken information 
delivery and pointing to appropriate illustrations, animations, etc.). The 
performance of the agent is expected to be consistent with the level 
of performance of untrained human tutors. The talk will give a brief 
overview of the overall architecture of the tutor, explore some of the 
challenges and tools that have been used in solving them, and provide a 
demo of the current version, AutoTutor^ with an emphasis on the multi- 
modal delivery of the dialog cycle. 



AutoTutor can be seen as consisting of two interacting large modules: lan- 
guage and sizzle. The language module is there to understand the student’s 
input (text from keyboard at present). It currently consists of several submod- 
ules, including parsers (for syntactic analysis), latent semantic analysis (USA, 
for meaning extraction), speech act classifier (has the student given an answer 
or asked a question?) and dialog moves (how should AutoTutor respond to the 
student’s input?). The sizzle module is there to take the abstract description of 

* This group consists of over twenty faculty and students in psychology, computer 
science and education funded by the National Science Foundation. They include 
currently: Pat Chipman, Scotty Craig, Rachel DiPaolo, Stan Franklin, Max Gar- 
zon, Barry Gholson, Art Graesser, Doug Hacker, Derek Harter, Xiangen Hu, Bianca 
Klettke, Roger Kreuz, Kirsten Link, Zhijun Lu, William Marks, Brent Olde, Natalie 
Person, Victoria Pomeroy, Katja Wiemer- Hastings, Peter Wiemer-Hastings, Holly 
White, and several new students. 

J. Pavelka, G. Tel, M. Bartosek (Eds.): SOFSEM’99, LNCS 1725, pp. 261-263, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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the pedagogically best reponse in the dialog turn to give the student, as deter- 
mined by the language modules, and enact it, i.e. give it back to the student 
in multimodal and ergonomically natural form. This module is enacted by a 
talking head and includes back-channel feedback (visual gestures including emo- 
tional reactions for the students), verbal short feedback (e.g., appropriate frozen 
expressions from a selected repertoire), and/or information splices pointing to 
appropriately selected pictures and animations illustrating the target concept. 

We use two approaches to talking head design embodying sizzle delivery. 
In the so-called ‘canned’ approach, the sizzle is put together from picture files 
shown in appropriate succession. These files need to be choreographed manually 
(currently using the Microsoft’s agent program [6]). This approach is potentially 
computationally taxing because of timewise expensive frequent access of external 
memory. On the other hand, it allows high quality artistic design, rendering, and 
rapid prototyping. In the second so-called ‘on-line’ approach, the sizzle gener- 
ates entirely on the fly the graphics, sketches, and animations required to deliver 
each Auto Tutor’s dialog turn. This approach would allow AutoTutor to react 
in real-time when integrated with the language submodules. On the other hand, 
it presents a challenge to computer scientists to integrate standards (such as 
MPEG4) for face and body part graphics and animation, with the relevant out- 
put from the language module in order to show naturalistic facial expressions 
and gestures that convey the target feedback (emotions, gestures). In either 
case, the sizzle is context sensitive, yet can be implemented independently and 
autonomously by AutoTutor. The driving goal is to make AutoTutor a truly 
autonomous agent able to visualize graphically the smarts derived from its nat- 
ural language prowess. Studies are in progress to evaluate the effectiveness of 
AutoTutor. 

There are several questions posed by this type of interactive agents. The 
computational issues, already mentioned above, concerning rendering graphics 
and animation involved in facial and bodily features are certainly of research 
interest. Despite a large literature on animating human-like figures (see Badler 
et al, 1993; Perlin, 1995 for example) and even faces (the MPEG standard [7], for 
example), placing face and pointing arms in a tutorial context where the focus 
of the user’s attention is centered elsewhere (comprehension, speech inderstad- 
ing of a synthetic voice, attentiong shifting back and forth between the agent 
and illustrations) and the agent must process a number of many other tasks 
(natural language among others), alters dramatically the space of feasible solu- 
tions. The most challenging questions concern the basic principles that govern 
agent-human interactions. Despite documented evidence that humans tend to 
think of and treat virtual agents in the same way they do humans (Reeves and 
Nash, 1996), it is not clear to us that the same kind of devices humans use for 
effective communication among humans (gestures Krauss, 1998; egocentric gen- 
eration Keysar, Bar & Horton, 1998, for example) are the best ways for a virtual 
agent to communicate to and with a human. Vice versa, it seems clear that limi- 
tations in the perceptual apparatus of virtual agents (keyboards, or even speech 
recognition) will force humans and researchers to adapt and develop more effec- 
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tive strategies for interaction with this new kind of intelligent agents. Integrating 
appropriate computational, cognitivist, and artistic constraints at the right level 
of granularity in agents of this type is an open topic for interdisciplinary research. 

References 

1. N. Badler, C. Phillips, B. Webber. Simulating Humans: Computer Graphics, Ani- 
mation, and Control. Oxford University Press, 1993. 

2. S. D. Craig, B. Gholson, M. Garzon, X. Hu, W. Marks, P. Wiemer-Hastings, Z. Lu, 
and The Tutoring research Group. Auto Tutor and Otto Tudor. Int. Conf. on 
Artificial Intelligence in Education, Le Mans, France, July 1999. 

3. B. Keysar, D. J. Barr, and W. S. Horton. The Egocentric Basis of Language Use: 
Insights from a Processing Approach. Current Directions in Psychological Science 
7:2 (1998), 46-50. 

4. R. M. Krauss. Why do we Gesture When We Speak? Current Directions in Psy- 
chological Science 7:2 (1998), 54-60. 

5. L. McCauley, B. Gholson, X. Hu, A. Graesser, and the Tutoring research Group. 
Delivering Smooth Tutorial Dialog Using a Talking Head. Workshop on Embodied 
Conversational Characters, S. Prevost and E. Churchill (Eds)., Tahoe City, CA, 
1998. 

6. Microsoft Agent 2.1. Microsoft Corporation, 
http:/ / WWW. microsoft.com/intdev/agent/. 262 

7. The Moving Pictures Experts Group. Document MPEG96/N1365 (draft): Eace 
and Body Definitions and Animation Parameters. 

http:/ /drogo. cslet.stet.it/mpeg/chicago/animation.htm. 262 

8. K. Perlin. Real-time Responsive Animation with Personality. IEEE Trans, on Visu- 
alization and Computer Graphics 1:1 (1995). 

9. B. Reeves and C. Nash. The Media Equation. How People Treat Computers, Tele- 
vision and New Media like Real People and Places. Cambridge University Press, 
New York, 1996. 

10. P. Wiemer-Hasting, A. C. Graesser, D. Harter, and the Tutoring research Group. 
The Eoundations and Architecture of AutoTutor, in preparation. 



Coherent Concepts, Robust Learning 



Dan Roth and Dmitry Zelenko 



Department of Computer Science 
University of Illinois at Urbana- Champaign, USA 
{danr , zelenko}@cs . uiuc . edu 
http://L2R.cs.uiuc.edu/ danr 



Abstract. We study learning scenarios in which multiple learners are 
involved and “nature” imposes some constraints that force the predic- 
tions of these learners to behave coherently. This is natural in cognitive 
learning situations, where multiple learning problems co-exist but their 
predictions are constrained to produce a valid sentence, image or any 
other domain representation. 

Our theory addresses two fundamental issues in computational learn- 
ing: (1) The apparent ease at which cognitive systems seem to learn 
concepts, relative to what is predicted by the theoretical models, and 
(2) The robustness of learnable concepts to noise in their input. This 
type of robustness is very important in cognitive systems, where multi- 
ple concepts are learned and cascaded to produce more and more complex 
features. 

Existing models of concept learning are extended by requiring the tar- 
get concept to cohere with other concepts from the concept class. The 
coherency is expressed via a (Boolean) constraint that the concepts have 
to satisfy. We show how coherency can lead to improvements in the com- 
plexity of learning and to increased robustness of the learned hypothesis. 



1 Introduction 

The emphasis of the research in learning theory is on the study of learning single 
concepts from examples. In this framework the learner attempts to learn a single 
hidden function from a collection of examples (or other, more expressive, modes 
of interaction) and its performance is measured when classifying future examples. 
The theoretical research in this direction [19,21] has already proved useful in that 
it has contributed to our understanding of some of the main characteristics of the 
learning phenomenon as well as to applied research on classification tasks [5,8]. 

One puzzling problem from a theoretical and a practical point of view, is 
the contrast between the hardness of learning problems - even for fairly simple 
concepts - as predicted by the theoretical models, and the apparent ease at which 
cognitive systems seem to learn those concepts. Cognitive systems seem to use 
far less examples and learn more robustly than is predicted by the theoretical 
models developed so far. 

In this paper we begin the study of a new model within which an explanation 
of this phenomenon may be developed. Key to this study is the observation that 
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cognitive learning problems are usually not studied in isolation. Rather, the 
input is observed by multiple learners that may learn different functions on the 
same input. In our model, the mere existence of the other functions along with 
the constraints Nature imposes on the relations between these functions - all 
unknown to the learner - contribute to the effective simplification of each of the 
learning tasks. 

Assume for example that given a collection of sentences where each word is 
tagged with its part-of- speech (pos) as training instances, one wants to learn a 
function that, given a sentence as input, predicts the pos tag of the Rh word in 
the sentence. E.g., we would like to predict the pos tag of the word can in the 
sentence This can will rust^. The function that predicts this pos may be a 
fairly complicated function of other tokens in the sentence; as a result, it may be 
hard to learn. Notice, however, that the same sentence is supplied as input to the 
function that predicts the pos of the word will and that, clearly, the predictions 
of these functions are not completely independent. Namely, the presence of the 
function for will may somewhat constrain the function for can. For example, 
the constraint may be that these functions never produce the same output when 
evaluated on a given sentence. This exemplifies our notion of coherency: given 
that these two functions need to produce coherent outputs, the input sentence 
may not take any possible value in the input space of the functions (that it could 
have taken when the function’s learnability is studied in isolation) but rather 
may be restricted to a subset of the inputs on which the functions outcomes are 
coherent. There exists several possible semantics for the coherency conditions 
and here we present only the one that we find most promising in that we can 
present results that indicate that the task of learning a concept / becomes easier 
in these situations. 

A fundamental question in the study of learning is that of data preparation. 
In machine learning it is often found that in order to ensure success at a new 
learning task considerable effort has to be put into creating the right set of vari- 
ables, and into eliminating ones if there are large numbers of these. In cognitive 
learning, on the other hand, there is no evidence for the existence of explicit 
methods for achieving these ends. The learning process appears to proceed with 
the set of previously known functions as the set of variables and overcome these 
problems implicitly. 

The ability to chain predictors and perform inferences that are based on 
learned functions is the second fundamental issue that the coherence assump- 
tion contributes to. To study this we define the notion of robustness of learned 
concepts. In particular, we are concerned with robustness of learnable concepts 
to attribute noise. This type of robustness is important in cognitive systems, 
where multiple concepts are learned and “chained” [20,12,16]. Namely, the out- 
put of one learned predictor may be used as input to another learned predictor. 
Thus errors in the output of one predictor translate to attribute noise in the 
input to another. Therefore, predictors have to tolerate this noise; we show that 



^ This may not be the exact way one chooses to model the problem [18]. However, this 
is a reasonable abstraction that helps deliver the intuition behind our point of view. 
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learning coherent concepts results in robust concepts and briefly discuss relations 
to large margin classification and future work. 

The rest of the paper is organized as follows. In Section 2 we describe the stan- 
dard learning models and delineate the notion of concept coherency. Section 3 
defines a preliminary semantics of concept coherency and studies its implica- 
tions. In Section 4 we define the main semantics of concept coherency. We then 
analyze learning coherent linear separators in Section 4.1. We show that in the 
new model we can achieve a significant reduction in the mistake bound for the 
Perceptron. We also investigate the structural properties of coherent linear sep- 
arators and relate the properties to the mistake bound reduction. In Section 5 
we study the relationship between coherency and robustness to attribute noise. 
First, we introduce a noise model that allows noise to be present when the 
learned hypotheses are being evaluated and define a robustness condition that 
guarantees noise tolerance in this model. Finally, we show that coherency entail 
the robustness condition, thus making 1 earned concept more robust. 

2 Preliminaries 

As in the traditional models, the learning scenario is that of concept learning 
from examples, where a learner is trying to identify a concept f E when 
presented with examples labeled according to /. We study learning in the stan- 
dard pac [19] and mistake bound [13] learning models. It is well known that 
learnability in the pac model depends on the complexity of the hypothesis class. 
Specifically, it is equivalent to the finiteness of the VC-dimension [22], a combi- 
natorial parameter which measures the richness of the function class (see [21,11] 
for details). Moreover, it is known [3,6] that the number of examples required 
for learning is linear in the VC-dimension of the class. Mistake bound learning 
is studied in an on-line setting[13]; the learner receives an instance, makes a 
prediction on it, and is then told if the prediction is correct or not. The goal is 
to minimize the overall number of mistakes made throughout learning process. 

The usual way to constrain the learning task is to explicitly restrict the con- 
cept class. Here we are mostly concerned with the case in which the restriction is 
imposed implicitly via interaction between concepts. More precisely, we are inter- 
ested in a learning scenario in which there exist several concepts /i , / 2 , • • • , //c 
from the concept class Let g: {0,1}^ ^ be any Boolean function of k 

variables. The notion of coherency we study is formalized by assuming that the 
concepts /i, / 2 , • • • , //c are subjected to a constraint g. In all cases, however, we 
are interested in learning a single function f\ E T under these conditions. 



3 Class Coherency 

For purposes of illustration we first explore an overly strong notion of coherency, 
which leads to a restriction on the function class. This is relaxed in the next 
section and yields the main definition. 
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Let ^ be a concept class over X. The direct /c-product of the concept 
class X is the set ^^ = {/:/ = (/i, , //.), fi e X,i = 1, . . . ,k}. Therefore, if 
/ / : X {0,1}^. Thus, learning k functions with a binary range can be 

reduced to learning a single function with range {0, . . . , 2^ — 1}. 

A theorem in [1] states that this transformation (and its inverse) preserves 
PAC learnability ^ . 

Theorem 1. [1]. is learnahle iff X is learnahle. 



Definition 1 (Class Coherency). Let X be a eoneept elass and g: {0, 1}^ ^ 
{0,1} a Boolean eonstraint. is a eoherent eolleetion of funetions if 

= {(/i, . . . , /fe) G G V (fl(/i(x), . . . , /fe(x)) = 1)}. 

Intuitively we can think of g as reducing the range of functions in . That is, if 
Y = ^“^(1), then we do not care about elements / G for which range{f) 2 Y . 

The observation that a constraint g reduces the range of the functions in 
leads to the following sample size bound for pac-learning which is immediate 
from the results of [1]. 

Theorem 2. Let m = Then, the pae learning sample eomplexity of 

Xg is 0{^{d{\ogm) log ^ + log |-)) where d is any appropriate eapaeity measure 

Example 1. Let X be the class of axis-parallel rectangles inside [0, 1]^. Let 
g{fi,f 2 ) = ifi 7^ / 2 )- Then Eg is the class of the pairs (/i,/ 2 ) of axis-parallel 
rectangles, where fi is the complement of f 2 in [0, 1]^. Note that in this case Eg 
is a class of functions with the binary range {01, 10}. For binary- valued func- 
tions, the appropriate capacity measure of Theorem 2 is the VC-dimension of 
Eg. It is not difficult to see that three points can be shattered by the concept 
class, but no four points can. Therefore, VCD{Eg) = 3; however, VCD{E) = 4 
and, hence. Theorem 2 implies that the sample complexity of learning the con- 
cept class E alone is greater than the sample complexity of learning it in the 
presence of other functions when they are all constrained by g. Thus, adding 
more concepts may make learning easier. 

While definition 1 captures the simultaneous nature of the learning scenario, 
it is still restrictive in that it imposes global constraints on all the k functions. We 
would like to relax this further and emphasize that we are interested in learning 
a single function; say, /i. We would like to study how the learnability of this 
function is affected by the presence of the other functions and the requirement 
that they behave coherently. In the next section we suggest the main definition 
of this paper. 

^ PAC learnability for multi-valued functions is shown to be characterized by the 
finiteness of a capacity measure of a function class, see [1] for details. 
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4 Distributional Coherency 

In the previous section we removed from any /, such that g{f{x)) = 0 for 
some X e X. Now, for each / G we simply restrict the domain of / to X', 
where \fx G X' , g{f{x)) = 1. Formally, 

Definition 2 (Distributional Coherency). Given a Boolean eonstraint g and 
a elass T of funetions, we define the elass of g-eoherent funetions Tg to he the 
eolleetion of all funetions /"*" : X ^ { 0 , 1 }^ U defined by 

= / /(^) ifaifix)) = 1 

^ ' I ^ otherwise 

We interpret the value of as a forbidden value for the funetion f. In this way 
we restriet the domain of f to the subset X' of X satisfying the eonstraint g. 

The constraint semantics in Def. 1 is stronger (more restricting) than the one 
above. To see that let, e.g., X be the class of (non-identically false) monotone 
DNF, and g is (/i 7^/2)- Then, Tg is empty, because /i(l) = 1 = /2(1), for 
any fi^ f 2 ^ F. But, in X*, we simply restrict the domain of each fi^ f 2 to the 
non-overlapping areas of /i , /2 • 

In the pac learning model the above constraint can be interpreted as restrict- 
ing the class of distributions when learning a function fi G F. Only distributions 
giving zero weight to the region X \ X' are allowed. We formalize this by intro- 
ducing the distribution-compatible learning framework. 

Definition 3. Let T be a elass of Boolean funetions over X. Let /i, . . . , //c G X 
be subjeeted to a eonstraint g. Then, a distribution D over X is said to be f\- 
compatible w.r.t to f2^ . . . ^ fk ^ F and g, if D{x : /*(x) = ^} = 0 . We denote by 
Vf^ the elass of all fi-eompatible distributions (w.r.t to f2, fk ^ F and g). 

Note that the restriction on the domain of the target function can be arbitrary 
rather than enforced by a particular Boolean constraint. However, we are mostly 
interested here in the case in which restrictions naturally arise from constraints 
on a collection of functions. 

To motivate investigation into the gain one might expect to have in this 
learning scenario, consider the following example. 

Example 2 . Let F be the class of disjunctions. Consider learning f± from examples, 
in the presence of /2 and the constraint g = (/i 7^ / 2 ). Suppose that both f± and /2 
include a literal 1 . The constraint implies that X' does not contain examples where I 
is 1 (otherwise, both fi and /2 will be 1 on the examples). Therefore, the constraint 
effectively reduces the size of the target disjunction fi since the existence of literals 
common to fi and /2 in the target disjunction is irrelevant to predictions on XL Thus, 
if ni,ri 2 is the number of literals in /i,/ 2 , respectively, Uc is the number of common 
literals, then using an attribute efficient algorithm like Winnow to learn fi in the 
presence of /2 and the constraint g gives a mistake bound of 2(m —ric) (log ni + 1) [13]. 
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This model is a generalization of the Blum and Mitchell [ 2 ] model. They study 
learning two functions /i,/2 over different domains (Xi and X2, respectively), 
where the learner sees only pairs (xi,X2) G X = Xi x X2 that satisfy = 

/2(x2). This is a special case of our model, when x = (xi,X2) and the functions 
/15 /2 are defined over subdomains Xi, X2 rather than the whole X. In example 2 , 
if restricted to monotone disjunctions, we get the domain decomposition for free, 
because the constraint forces the literal sets of the disjunctions to be disjoint. 
Thus, by applying the results of [ 2 ]^, one can quantify the reduction in the 
number of examples needed for learning constrained monotone disjunctions. 
Next we analyze a more general case of learning in the coherency model. 

4.1 Learning Linear Separators 

Let X be the class of half-spaces in and let g be (/i = /2). /i and /2 are 
depicted in Figure 1 . The arrows point in the direction of the positive half-spaces 
with respect to the corresponding lines. The constraint g restricts the domains of 
both fi and /2 to the shaded areas. Therefore, when learning fi (and similarly, 
/2) we will see examples only from the shaded areas X' C X. For x G X', 
fi{x) = f2{x). While, in principle, learning fi may be hard due to examples 
nearby the separator, now there are many linear separators consistent with fi{x). 
Therefore, at least intuitively, finding a good separator for fi{x) would be easier. 
For the case when the linear separator is learned via the Perceptron learning 
algorithm, we can show the following. 

Theorem 3. Let f\ and he two hyperplanes (w.l.o.g, passing through the 
origin) with unit normals W\^W 2 G , respeetively. Let a = cos(rei,re 2 ) = 
w\ ' W2. Let S = U S~ he the sequenee of positive and negative examples so 
that Vx G S^fi{x) = f2{x). Let S he linearly separable hy both fi and f2 with 
margins 2 Si and 2S2, respeetively. \x\ < R then the number of mistakes 

the Pereeptron makes on S is bounded by [ 3 ^, where [3 = 

Proof For a sequence S = S~ U S~^ we replace each x G S~ with —x. Then the 
standard Perceptron learning algorithm becomes: 

'u; := (0, ...,0) 
for all X G S' do 
if w ' X < 0 then 
w := w X 
end if 
end for 

Since S is linearly separable by fj with margin Sj: 

\/x e S^Wj • X > Sj > 0, j = 1, 2 

Let be the value of w after i mistakes. Then, if the ith mistake is made on x: 

\w^\‘^ = -h x)‘^ = -h 2 {w'^~^ • x) + \xf < R^ < iR^, 

^ The results in [2] require in addition certain conditional independence assumptions. 
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where the last inequality results inductively. Also, using a similar argument, 

w'^ ' w\ = -\r x) ' w\ = • w\ -\r w\ • x > • w\ -\r > i5i ( 1 ) 

Similarly, 

w'^ ’ W2 > iS2 ( 2 ) 

Note that (1) and (2) hold simultaneously because /i ,/2 have the same values 
on X, and, hence, whenever a mistake is made for /i, a mistake is also made 
for / 2 - It then follows from (1) and (2) that 

(w® • (wi + W2))'^ = {{w\wi) + {w\ 102))“^ > f(Si + 62)“^ ■ ( 3 ) 

We now bound (w® • {wi + ^ 2 ))^ from above. By Cauchy-Schwartz: 

(w® • {wi + 102))^ < |w®p(|wip + |W2p + 2 (wi • 102)) = 

= 2\w^\^{l + a) <2iR^{l + a). (4) 

Combining (3) and (4), 

(^ 2 )^ ^ Of) 



Hence, 



1 + a 

i < 

— 2 ^4i±22^2 

Recall that while the general Perceptron mistake bound is ^ [15], the mere 
presence of /2 and the constraint g improves the mistake bound by a factor 
of p. As a approaches —1, the shaded regions become smaller and, hence, P 
approaches 0. 

While Theorem 3 shows the gain in mistake bound when learning w\ (as 
a function of W 2 and the constraint) it is possible to quantify this gain in an 
algorithmic independent way by characterizing the seP^ E{w\^W 2 ) of linear sep- 
arators consistent with the imposed constraint. 

Given W 2 and the constraint denote by E{w\^W 2 ) the set of all linear 
separators that can be learned without any loss in accuracy when the target 
concept is w\. Formally (omitting the dependence on g from the notation), for 
any two vectors W\^W 2 G ^ let X = {x \ x ^ sgn{wi • x) = sgn{w 2 ' x)}. 
That is, X corresponds to the shaded area in Figure 1. Then: 

E{wi,W 2 ) = {w : w G R^ ^x G X', sgn{w • x) = sgn{wi • x)} 

Theorem 4 uses the well-known Farkas’ Lemma [14] from linear programming. 

^ The set E{wi^W 2 ) depends on the constraint g. The results in this section can be 
presented for any symmetric constraint on but will be presented, for clarity, 

only for equality. 
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Lemma 1. For any matrix A^xn c ^ , exactly one of these conditions 

hold. 

(1) {x \ Ax < 0 ,c-x > 0 } is non-empty; (2) {y \ A^y = c, ^ > 0 } is non-empty 

Theorem 4. E{w\^W 2 ) = {w \ w = aw\ + bw 2 ] a^b G R; a,b > 0}. 

Proof. Denote W = {w : w = aw\ + bw 2 ]a^b G R;a,b > 0}. Clearly, W C 
E{wi,W 2 ). In order to prove that E{wi,W 2 ) C W, we partition X' in two sets 

X'_^ = {x : X e R^, wi ' X > 0,W2 ' X > 0 } and 

X'_ = {x : X e R^, wi ' X < 0,W2 ' X < 0} 

Observe that X'_ = {—x : x G X'j^}. Fix a re G R^ ^ so that re • x > 0 on X'j^. 
Hence, re-x < 0 on X'_^ and re G F^(rei,re 2 ). Now apply Lemma 1 with A = 

(H is an 2 X n matrix whose rows are rei,re 2 ), and c = w. Since re • x < 0 on 

X'_, ( 1 ) is not satisfied; hence, ( 2 ) is satisfied, and re = arei + 6 re 2 , where a^b 

are some positive numbers. Thus, E'(rei,re2) C W. 

If we require the members of E'(rei, re 2 ) to be unit vectors, then unconstrained 
learning of fi can be viewed geometrically as searching for a point on the unit 
sphere that is close to the target wi. In the presence of W 2 and the constraint, 
we have the following corollary. 

Corollary 1. The intersection of E{w\^W 2 ) with the unit sphere is a curve 
on the unit sphere in R^ connecting w\ to W 2 - The length of the curve is 
cos~^{wi • rc2). 

Thus in the presence of W 2 and the constraint, the learning algorithm seeks 
a point on the sphere that is close to any of the curve points. As we have shown, 
algorithmically, for the Perceptron, this translates to reducing the mistake bound 
proportionally to the length of this curve. 

5 Robustness 

In this section we show that the coherence assumption made in this paper has 
the effect of making the learned concepts more robust. We start by defining 
robustness and proving that concepts learned under this model can indeed be 
evaluated robustly (generalizing previous models of attribute noise); we then 
show that learning coherent concepts is robust and discuss the relation to large 
margin theory. 

Definition 4 (Attribute Robustness). For x^y G {0,1}’^ let H{x,y) be the 
Hamming distance between x and y ( that is, the number of bits on which x and 
y differ). Let 

Sk = {x : Vy, ifH{x,y) < k then f{x) = f{y)}. 

We say that the pair {D, f) is (e, fc)-robust, if D{Sk) > 1 — e. 
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Fig. 2. Constrained Robust Half- 
spaces 



Intuitively, the condition means that w.h.p. all the points in a ball of radius 
k around any point x have the same label. This can be relaxed by requiring 
f{y) = /(^) fo hold only for a (1 — 7 ) portion of the points in the ball Bk = {y : 
H{x^y) < k}^ but we will not discuss this to simplify technical details. 

Let / be a concept over X = {0, and let D he a distribution over X. We 
denote by the distribution that results from choosing k bits uniformly and 
flipping them. It is easy to see that if {D^f) is (e, /c) -robust, and x G Sk^ then 
flipping k bits of x does not change the value of /. Hence, 

error j^k (/) < D{x ^ Sk) < e 

flip 



and the robustness condition guarantees a small error when evaluating / on the 
noisy distribution. It follows that if h is an e-good hypothesis for / under D, 
and if {D^ f) is (e, /c)-robust, then h is a 2 e-good hypothesis under 

We note that the distribution is an example of an attribute noise 

model [9,4]. These models usually assume the presence of noise in the learn- 
ing stage and aim at learning a good approximation of the target concept over 
the original noiseless distribution. However, as can be readily seen (and has been 
pointed out in [9]), in a more realistic setting in which the learned hypothesis is 
to be evaluated under noisy conditions, this hypothesis may be useless (e.g., con- 
sider a simple conjunction or a xor function). The robustness condition defined 
above guarantees that a hypothesis learned in the presence of noise also performs 
well when being evaluated under these conditions. This holds for a more general 
attribute noise model, the product attribute noise, defined as follows. Let D be a 
distribution on the instance space X = {0, l}’^. Assume that an attribute i of an 
example x G X sampled according to D is flipped independently with probabil- 
ity i = 1, ..., n. Denote by p = Yl^=i Pi expected number of bits flipped 
in an example. We denote by the distribution induced this way on X. 
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Theorems. Let {D,f) be {e,k)-robust. If k > p + 




then 



^ 2e. 

Proof Let (x, f{x)) be an example sampled according to D. Let x' be the result 
of flipping the bits of x according to the noise scheme described above. Denote 
by Pr the product distribution induced by the bit flipping. Then we have: 



error = -D^,p{a;' : f(x') 7^ f(x)} = D^.p{x' : x G Sk, f{x') ^ f(x)} + 
+ D%^{x' :x^ Sk, fix') fix)} < 

^ ^ ^ Sk,fix') i- fix)} + €< Pr{Hix,x') >k} + € 



To bound Pr{H{x, x') > k}, we let Y be the random variable describing the 
number of bits flipped in an example. Note that Y = H{x^x') and E\Y] = p. 



where the last inequality follows directly from the Chernoff bounds [10]. Hence, 
Pr{H{x^x') > /c} < e, and error (/) < 2e. 

Thus, if we have an e-good hypothesis for noiseless distribution, and the tar- 
get concept with the underlying distribution satisfy the above (e, /c)-robustness 
condition then the hypothesis will also be 3e-good for the noisy distribution. 

5.1 Coherency Implies Robustness 

We now establish the connection between coherency and robust learning. This 
is done in the context of learning linear separators learning, as in Section 4.1. 
As before, the target function is /i, and we assume the presence of /2 (w.l.o.g., 
both /i, /2 pass through the origin), and that they are subjected to the equality 
constraint g. However, here we restrict the domain of fi and /2 to X = {0,1}’^. 
Let D be a distribution over X. We require the distribution to give small weight 
to points around the origin. Formally, let Br = {x : x G X,\x\ < r} be the 
origin-centered ball of radius r. Then we require D to satisfy D{Br) < e. 

Notice that in general, when learning a single linear separator /, this property 
of D does not imply that (D, /) is robust. The following theorem shows that with 
the equality constraint imposed, the property is sufficient to make (D, /) robust. 

Theorem 6. Let f\ and /2 be hyperplanes (through the origin) with unit nor- 
mals wi^W 2 G R^, respeetively. Let a = cos(rci,re 2 ) = w\ • W 2 - Let D be a 
fi-eompatible distribution (w.r.t f 2 and the equality eonstraint g) that satisfies 

D{Br) < e, where r > k^ Then, there is a linear separator f, so that 
D{x : f{x) 7 ^ fi{x)) = 0 and f is (e, k)-robust. 




-m2 

Pr{Y > k} = Pr{Y > p m} = Pr{Y — E[Y] > m} < e 
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Proof. (Sketch) The idea of the proof is to exhibit a linear separator / that is 
consistent with fi on D and, for all points lying outside has a large “margin” 
separating positive examples from negative ones. Let / be the hyperplane bisect- 
ing the angle between fi and / 2 - That is, w = ^{wi + 1 ^ 2 ), where w is the normal 
vector of /. By theorem 4, re G E{wi^W 2 ); hence, D{x : f{x) 7 ^ fi{x)) = 0. Now 
fix a point x e so that fi{x) = f 2 {x) and x ^ Br- Figure 2 . is the projection 
of /i, / 2 , / to the 2 -dimensional plane determined by the origin, the point x and 

x’s projection onto /. Then we have that \XD\ = \x\sm{XOD) > If 

then \XD\ > hence flipping < k bits of x will not change the 

value of / {\XD\ is the distance from x to /). Therefore, / is (0, /c) -robust for 
any point of the subdomain X' satisfying the constraint and lying outside of the 
ball Bj.. This implies that {D^f) is (e, /c) -robust. 

We note that the assumption D{Br) < e in the Theorem 6 is satisfied if there 
is a margin r separating positive examples of fi from its negative examples [ 21 ], 
so that the weight (with respect D) of examples lying inside the margin is less 
than e. Also, existence of such a distributional margin implies that a sample of 
examples from the distribution will be linearly separable with margin at least 
r with high probability, thus guaranteeing that there is a large margin hyper- 
plane consistent with the sample, that has small error with respect to D [7]. In 
particular, we construct such a hyperplane / in the proof of Theorem 6 . 

6 Conclusions 

This paper starts to develop a theory for learning scenarios where multiple learn- 
ers co-exist but there are mutual compatibility constraints on their outcomes. 
We believe that these are important situations in cognitive learning, and there- 
fore this study may help to resolve some of the important questions regarding 
the easiness and robustness of learning that are not addressed adequately by 
existing models. In addition, we view this model as a preliminary model within 
which to study learning in a multi-modal environment. We have shown that 
within this model the problem of learning a single concept - when it is part 
of an existing collection of coherent concepts - is easier relative to the general 
situation. Moreover, this gain is due only to the existence of the coherency, even 
if the learner is unaware of it. 

The results of this paper are restricted mostly to linear separators - not a 
severe restriction given their universal nature in theory and applications [16,17]. 
Some of the future directions of this work include the study of more general 
families of constraints as well as some algorithmic questions that arise from this 
point of view, including the relations to large margin classification alluded to 
above. 
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Abstract. This paper presents some applications of data and signal 
processing using artificial neural nets (ANNs) which have been investi- 
gated at the University of Tubingen. The applications covering a wide 
range of different interesting domains: color restoration, gas sensing sys- 
tems, internet information search and delivery, online quality control and 
nerve signal processing. The paper presents each application in detail and 
describes the problems which have been solved. 



1 Introduction 

The relevancy of artificial neural nets (ANNs) for industrial application rose sig- 
nificantly in recent years, especially in the domain of data and signal processing. 
ANNs owe this effect to its different abilities as learning using data samples, 
processing of nonlinear data, parallelism and its insensitivity to noisy data. 

This paper comprise a selection of applications of data and signal processing 
using artificial neural nets which have been recently investigated at the institute. 
The selected applications covering a wide range of different domains: 

— Color restoration of scanned images^ 

— Gas sensing systems^ 

— Internet information search and delivery^ 

— Online quality control of semiconductor chips^ 

— Real time processing of nerve signals^ 

^ The project Restoration of Colors in Scanned Images using Artificial Neural Net- 
works is partly granted by the Daimler-Benz-Stiftung, project #029547. 

^ The project Molecular Recognition for Analytics and Synthesis using Artificial Neural 
Nets is granted by the DFG (Deutsche Forschungsgemeinschaft) . 

^ The project OASIS (Open Architecture 5'erver for Information S'earch and Delivery) 
is granted by the European Gommunity under INGO Gopernicus project Programme 
#PL96 1116. 

^ The project SMART Fabrication- Neural Networks: New production concepts in semi- 
conductor manufacturing is granted by the Bundesministerium fiir Bildung und 
Forschung (BMBF). 

^ The project INTER (Intelligent Neural InTFMace) is granted by the European 
Gommunity under ESPRIT BR project #8897. 

J. Pavelka, G. Tel, M. Bartosek (Eds.): SOFSEM’99, LNCS 1725, pp. 277-294, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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After a brief introduction to artificial neural nets we are concentrating in 
this paper on the applications of ANNs. In this paper we show the possibility to 
apply ANNs to different data and signal processing problems whereas the special 
attributes of ANNs are required. 

2 Artificial Neural Nets 

Artificial neural nets are information processing systems consisting of several 
elementary units which are inspired by biological neurons. The elementary unit is 
called ’neuron’ like its biological paragon. Thus, an artificial neuron represents a 
mathematical model of a biological neuron. Like a biological neural net the ANN 
transmits the information (activation) between the neurons via unidirectional 
connections. Due to the topology of the net as well as to the weighting of the 
connections between the neurons an ANN is capable to represent linear as well 
as nonlinear function or coherences. 

Even if a great number of various types of ANNs are existing, every ANN 
consists of at least 2 layers: an input layer and an output layer. As shown in 
Figure 1, most ANNs possess an additional hidden layer. Each layer consists 



Hidden layer 
(2 neurons) 




Input layer Output layer 

(4 neurons) (1 neuron) 



Fig. 1. A simple example of an ANN (feedforward net) consisting of an input 
layer, a hidden layer and an output layer 



of a number of neurons whereas the number of the input layer is equal to the 
number of components of the input vector. The number of neurons of the hidden 
layer as well as the number of the output layer depends on the application. 

Before an ANN can be used for an application, it must be trained. The 
training is a presentation of data samples to the ANN. Due to special training 
algorithms the ANN is able to dearn’ the coherences within the presented data 
samples and changes its weighting in a way to obtain the correct corresponding 
output. 

Depending on the training algorithms the different existing ANNs can be 
roughly divided into 2 classes: supervised learning and unsupervised learning 
neural nets. Supervised learning neural nets need an output vector with the 
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desired output to a corresponding input vector. The most known supervised 
learning nets are probably feedforward nets. Using feedforward nets different 
training algorithms can be applied. The most common one is the Backpropaga- 
tion algorithm [1,2] and its variations. 

In contrast to the supervised learning nets unsupervised learning nets don’t 
need a desired output vector corresponding to the presented input vector. Rep- 
resentatives of this class are the self-organizing maps [3,4] and ART (Adaptive 
Resonance Theory) [5]. 

Which one of the different algorithms is the best to solve a problem depends 
on the requirements of the application and its data. The choice is a matter of the 
operator and needs some experience. We have investigated different ANNs for 
the applications presented in this paper. Main advantages of the ANNs are their 
abilities as learning of coherences using data samples, processing of nonlinear 
data, parallelism and its insensitivity to noisy data. 



3 Restoration of Colors in Scanned Images 

In this chapter we present an approach to color restoration in scanned images, 
exploiting this capability. The scanner is a device which represents the entire 
range of color distortions taking place in color image processing. The distortions 
appear as non-linear spatial defects in the color gamut caused by inaccuracies of 
the scanner filters and sensors. The problem of restoration can be represented 
as an approximation of an inverted relationship between the color primaries. 



3.1 Color Representation, Distortions and Approach 

Color calibration in the area of computer graphics and the printing industry is 
a crucial issue. Due to inaccuracies and instabilities of the electronic elements 
and physical properties of dye or ink, the goal to match exactly the initial image 
remains very difficult to achieve. A challenging task is to compensate these dis- 
tortions numerically in the stage of image processing. 

Figure 2 illustrates the scale of color distortions. The distortions appear as 
spatial defects negligible around the grey axe and dramatically increasing to the 
frontiers of the color gamut. An important fact is, that despite this severe defor- 
mation of the color gamut, especially strong close to its frontiers, no hysteresis 
was observed, therefore we can still recover the geometrical color variation. 

A wide range of publications deal with the application of artificial neural 
networks to color processing. Conventional color restoration methods like poly- 
nomial regressions, look-up tables, linear interpolations etc. [6] are shown to 
be often a compromise between fine color resolution and time demands. This 
is caused by the high non-linearity of distortions associated with a very large 
number of pixels per image. A comparative study between a neural network 
with Cascade Correlation learning architecture and polynomial approximations 
ranging from a 3-term linear fit to a 14-term cubic equation demonstrates that 
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Fig. 2. Edges of the RGB gamut: original colors (left), combined original and 
scanned (centre) and scanned only (right). Gamut corners are marked according 
to the pure colors they contain 



the neural net outperforms these polynomial approximations [7]. Neural net- 
works have been applied to the prediction of the color ink recipe [8] . In the next 
section we present our approach to data acquisition, selection of network archi- 
tecture and learning method and discuss their effectiveness in respect to this 
particular problem. 



3.2 Color Data Acquisition and Neural Network Architecture 

Preparation of data, which will be fed into neural network is a very important 
step. Adequate representation and preprocessing (filtering, dimension reduction, 
scaling etc.) of input data can be of dramatic inffuence to the success of neural 
network application. We have used medium-end laser printer and flatbed scanner 
for data acquisition. Each primary color was represented by 11 uniform steps. 
Thus the entire RGB space was represented by altogether 1331 color patches. 
The RGB values of the scanned color patches formed the input data set for the 
neural network, and the RGB values of the original colors in turn formed the 
output set. The data cycle is presented in Eigure 3. 





Color data 




Color printer 




Scanner 






Fig. 3. The data cycle: Golor tables are printed, paper samples are scanned -the 
difference between the resulting data set and initial colors is used for training 
the ANN 



Gomparing different learning methods such as backpropagation (sigmoid and 
Gauss functions), Quickprop, Resilient Propagation (Manhattan- Training) and 
an interpolating SOM (I-SOM [9]) on a given data set provided us with follow- 
ing considerations: The error gradient descent process of backpropagation was 
unacceptably slow. Quickprop has shown a quick start in reducing the error but 
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also easily overtrains. The Rprop algorithm demonstrated good convergence as 
well as good generalization on testing data. In attempts to minimize the error 
in the testing data we have varied the number of hidden neurons. An increase 
of the network size beyond 20+20+20 does not result in substantial changes of 
error. 

In the case of Counterpropagation, an interpolation version was used [9]. 
The Kohonen layer was preset with vectors from the RGB space, corresponding 
to the scanned color patches and the Grossberg layer with original values for 
those patches. 216 Codebook vectors were chosen, so that the RGB square was 
homogeneously represented. In the best result, the RMS error for the test data 
performed in the same range as for feedforward methods. 

3.3 Discussion 

We have described an application of neural networks to restoration of colors in 
scanned images. This approach proves to be feasible and Resilient Propagation 
method shows a good balance of numerical and qualitative characteristics of the 
restoration. The neural network offers a promising solution for the problem of 
color restoration in digital images. 

4 Data Evaluation Method for Hybrid Gas Sensing 
Systems 

Chemical and biochemical sensors are used for a broad spectrum of applications. 
They may be applied in areas like environmental monitoring, process control, 
medical and quality analysis. Arrays of these sensors are usually called electronic 
noses [10,11,12]. The sensor signals serve as input to the feature extraction and a 
subsequent pattern recognition algorithm or multicomponent analysis. In order 
to increase their performance many efforts were made starting with improve- 
ments concerning sampling, filtering, preconditioning via advanced sensor tech- 
nology up to better feature extraction and data evaluation methods. 

As a reference system, the commercial hybrid gas sensing system Moses II 
(Modular Sensor System II) has been used for qualitative and quantitative anal- 
yses. The standard setup uses a Quartz Crystal Microbalance Module (QMB- 
Module) and a Metal Oxide Module (MOX-Module). Each module is equipped 
with eight Sensors. For each sample the sensor responses are recorded while the 
analyte proceeds through the modules. Depending on the flow, about 500 dis- 
crete values are recorded for each sample and each sensor. Usually the maximum 
response of a sensor is used as single feature. 

For improvements in the pattern recognition and multivariate data analy- 
sis part, common data evaluation methods have been tested systematically for 
various model systems including coffee and maize oil. 

Quite typical for applications of electronic noses, a small number of samples 
is taken as reference data and a large set of test data has to be classified or 
predicted. It is well known that this can get many evaluation methods into 
trouble, especially if high dimensional feature vectors are used. 
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4.1 Selected Case Study 

4.1.1 Qualitative Analysis As a specific example for applications in the food 
industry several coffee brands available on the German market were analyzed 
with Moses II. Examined brands included Hanseatica “Espresso dunkel”, Jacobs 
“Kronung” (pack 1), Jacobs “Krdnung” (pack 2), Melitta “Harmonic”, Melitta 
“Auslese”, Jacobs “Mein Mild’Or” and Tchibo “Eeine Milde”. 




Fig. 4. PC A Scores Plot of five different coffee brands available on the German 
market 



Eor each brand six samples were measured. 16 sensors were used with the 
peak height as single feature. Thus, for each sample we had got a feature vector 
with 16 components. The resulting PGA Scores-Plot for this application is shown 
in Eigure 4. 

Different ratios of training data to test data were used to evaluate the recog- 
nition performance of each method. The results are presented in Table 1. 



Table 1. Recognition performance of selected data evaluation methods 



Part of the data that has 
been used as reference data 


Number of correctly classified test samples (in %) 


KNN 


BPN 


SOM/G 


RBF 


ART 


2/3 


100 % 


100 % 


97,6 % 


97,6 % 


90,5 % 


1/2 


100 % 


97,6 % 


97,6 % 


97,6 % 


78,6 % 


1/3 


87,5 % 


97,6 % 


100 % 


95,2 % 


78,6 % 



Application of Artificial Neural Networks for Different Engineering Problems 283 



For artificial neural networks size and topology are crucial parameters. The 
number of input and output units is fixed by the number of sensors (16) and 
classes (7). Different values for the hidden layers resp. the size of the feature map 
were tested systematically for these parameters to obtain optimal results. For 
Backpropagation (BPN) the best result was achieved with 11 hidden neurons. 
Now, after the topology had been fixed, the test set was used as validation set 
to find the optimal number of training cycles to get the best results and to avoid 
overtraining (early stopping method). For the self organizing feature map several 
sizes have been tested and the best results were achieved with a map-size of 4D 3. 

4.1.2 Quantitative Analysis To see how different evaluation methods per- 
form in predicting concentrations of gases they were applied to a measurement 
of gas mixtures containing toluene, octane and propanol. The data set covered 
mixtures with same concentrations of each gas (100 ppm, 200 ppm or 300 ppm) 
and mixtures with the concentration of one gas slightly increased (by 20 ppm or 
50 ppm). Additionally for each gas concentrations between 100 ppm and 350 ppm 
were measured in absence of the other two gases. For each mixture three samples 
were taken. This lead to a total of 144 samples. 




Fig. 5. PCA scores plot of propanol, toluene, octane and their mixtures. Arrows 
indicate rising concentration (100 ppm to 950 ppm) 



The evaluation methods selected for this study were Multivariate Linear 
Regression (MLR), Principle Components Regression (PCR), Partial Least 
Squares (PLS), a Counterpropagation Neural Network (SOM/G) and a Stan- 
dard Backpropagation Neural Network (BPN). The mean prediction error in 
percent (referring to the total concentration of the mixture) for each method 
is displayed in Table 2. The number of principle components has been varied 
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between 2 and 16. The number of hidden neurons for the BPN was determined 
by the same procedure explained for the coffee dataset. The best results were 
achieved with 22 hidden neurons but even with seven hidden neurons the results 
of the BPN were much better than the results of the statistical methods like 
PCR, MLR and PLS. 



Table 2. Prediction error of different evaluation methods for gas mixtures in the range 
of 100 ppm to 950 ppm 



Ratio of reference data 
to test data 


mean prediction error 


MLR fPCR fPLS2 [BPN |SOM/G 


1/3 : 2/3 


5,617 % 4,951 % 4,970 % 2,730 % 0,752 % 



4.2 Discussion 

BPN and KNN lead to the best results for qualitative analyses. Both allow 
nonlinear discrimination. KNN is one of the most simple clustering algorithms 
and therefore easy to implement and easy to use. BPN can also be used for 
quantitative analyses and can be seen as kind of general purpose neural net. 
In predicting gas concentrations, it outperformed the statistical methods MLR, 
PCR and PLS. Despite the fact that small data sets were used, good results were 
achieved with all covered data evaluation methods. 



5 Internet Information Search and Delivery 

Another interesting project is the application of ANNs for information search. 
In frames of the OASIS-project (“Open Architecture for Server for Informa- 
tion Search and Delivery” ) we develop a multi-server platform that is designed 
to ideally balance the load of internet search processes. Searching for relevant 
information in the internet carried out by traditional search engines can not cope 
with user needs. Delivered results often do not satisfy users as not only relevant, 
but also lots of irrelevant documents are shown to them. 

The OASIS project wants to alleviate internet searching introducing an open 
architecture server that uses artificial neural networks to cluster HTML docu- 
ments at the result merging and at the HTML collection description step. User 
requests will be forwarded to those servers of the server system whose collection 
topic is most likely to match the user request. Requests from different topic col- 
lections will be merged at the server that propagated the user request. This result 
merging is carried out by a neural network that has exclusively been designed for 
OASIS project, called hierarchical radius-based competitive learning (HRCL). 
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5.1 Neural Network Clusterisation 

5.1.1 Hierarchical Radius-Based Competitive Learning We use a hier- 
archical radius-based competitive learning (HRCL) neural network that has 
exclusively been developed for OASIS project to accomplish the clustering. It is 
primarily based on the neural gas approach [13] and uses output neurons with- 
out fixed grid dimensionality: Neurons are rearranged due to their distances to 
the current input every time a new input sample is generated. Second, HRCL 
uses fixed radii around each neuron and repels those second winners from the 
current input sample whose radii overlap with the winner’s radius like in the 
rival penalized competitive learning algorithm [14]. 

Third, HRCL builds a hierarchy of clusters and subclusters in either top-down 
or bottom- up manner: the first generated top-down hierarchical level consists of 
detected clusters or cluster prototypes. Every neuron has been learned to rep- 
resent one prototype. The second level then refines first level clustering using 
the user-supplied fixed radius and tries to detect subclusters at every first level 
cluster, and so forth. Initial neuron settings at each hierarchical level are gen- 
erated due to probability densities at fixed cells in vector space using a cell-like 
clustering similar to the BANG-clustering system [15]. 



5.1.2 Advantages of HRCL Conventional statistical clustering methods 
like single- or one-pass clustering as well as similar heuristic methods are highly 
dependent on the order of input vectors fed into the system. Conventional neural 
approaches, above all error minimizing competitive learning algorithms, gener- 
ally are able to detect major clusters. 

Competitive learning methods with a-priori given and fixed output neuron 
dimensionality like Kohonens Self- Organizing Map (SOM) [3,4] also place neu- 
rons to locations with lower probability densities. 

Competitive learning without a given network dimensionality like Growing 
Neural Gas (GNG) [16] use adapted network dimensionality at vector hyper- 
spaces with different ‘local’ fractal dimension and thus try to circumvent the 
drawbacks mentioned above. Their granularity of detected clusters is highly 
dependent on the number of training steps: Depending on the duration of the 
training, GNG will find either clusters with appropriate cluster centroids or only 
subclusters, but not both. 

Figure 6 shows the training results of HRCL hierarchical clustering on 2- 
dimensional artificially arranged multi-modal input data using top-down hierar- 
chical refinement. The left picture depicts 3 neurons of the first hierarchical level 
placed at cluster centroids after 565 HRCL learning steps using a user supplied 
neuron radius of 0.3. The right picture depicts 5 HRCL neurons of the second 
hierarchy placed at subcluster centers of the cluster that is defined by one of the 
first level neurons plus its radius using additional 95 learning steps. HRCL is able 
to automatically build a hierarchy of clusters, subclusters and so on depending 
on neuron settings at each level and user supplied radius. For abovementioned 
input data that consist of 1,800 vectors, HRCL automatically detects 3 hierar- 
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Fig. 6. Left: first level HRCL; Right: second level HRCL 



chy levels reflecting the globularity of every top-level cluster and consumes appr. 
9 min on a Sun Sparc-Ultra 2. 



5.2 Discussion 

HRCL is not only able to detect overall clusters, but also produces a hierarchi- 
cal tree of clusters with appropriate cluster centroids and subclusters, subcluster 
centroids, and so on -if feasible. Clusters are locations of high probability den- 
sities in vector space. That requires coding and compression techniques which 
are able to generate vectors that he adjacent in vector space if and only if they 
share the same topic, clusters will represent HTML documents of the same or 
similar thematic contents. It is envisaged to use HRCL in order to cluster HTML 
document collections to obtain an hierarchical tree of clusters and subclusters 
whereas each cluster will optimally be related to an underlying topical content. 
User requests will be compared to cluster centroids in order to speed up search- 
ing. Even a Yahoo-like browsing facility can be thought of, offering the user to 
browse from cluster to subclusters and so on to refine her request. 

6 Prediction of Functional Yield of Chips in 
Semiconductor Industry Applications 

In semiconductor industry the number of circuits per chip is still drastically 
increasing, while the dimensions of the chips are continuously reduced. The num- 
ber of circuits per chip is being doubled approximately every three years. Not 
only the production cost per chip, but also the number of semiconductor facto- 
ries increases permanently. All this leads to a strong demand for high quality 
production on the condition of very low prices. So the importance of quality 
control and quality assurance is much raised. 

One of the most expensive aspects in the manufacturing process of chips 
are the tests of their functionality. Two different kinds of tests are used during 
and after the production, one is the process control monitoring (PCM-data) 
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early during the production process and the other test is the ‘functional test’ for 
testing circuits on finished chips (functional data). 

Especially the second test is not only very time consuming and therefore 
expensive, but also very late, so machine problems are recognized far too late 
and many defective chips are processed in the meanwhile. For this reason it 
would be a great improvement if this test could be made obsolete and machine 
problems or errors could be detected by using PCM-data. 



6.1 Data Samples 

The aim of this work was the prediction of the functionality of chips by means 
of PCM-data (yield model). These data samples contain production conditions 
and physical values like currents or layer thicknesses and are used for the predic- 
tion tool. 4860 training vectors of PCM-data have been measured by industrial 
partners. Each training vector contains 96 components. Thus, the data is not 
easy to handle for data processing tools. 



6.2 Methods 

There are many possibilities to reduce the dimension of the training data, but 
only a few can meet the assumption of a good prediction. In our work we used 
a regression method and principle component analysis (PC A). After this reduc- 
tion of parameters we compared the results of different neural network algorithms 
to achieve best results. In detail, we tried Radial Basis Functions (RBF) [17], 
Counterpropagation [18] and Feedforward networks like Backpropagation (Back- 
Prop) [2] and Resilient Propagation (RProp) [19]. 

For all networks, we first trained with half of the data samples (2430) and all 
components (96) and used the rest of the 4860 data samples for the validation 
set. After that we reduced the components of the input vectors to the number 
of 10. The main reason was to achieve a good and understandable model for 
the yield of the chips by means of PCM-data. Last but not least we checked 
the fitness of our model by forecasting the yield of chips with the reduced data 
samples. For the output of the networks we had a value between 0 and 100, 
namely the percentage of non defective chips on each wafer. 



6.3 Results 

As expected from earlier research results [20,21], the Counterpropagation net- 
work (8x12 SOM) with interpolation method proved to be stable and good in 
both, the prediction of yield with all parameters and with only 10 significant 
parameters. Surprisingly with RBF the results have been even better and the 
yield of a wafer has been predicted with an absolute error of about 8 % as shown 
in Figure 7. Although the algorithm is known to be difficult in forecasting appli- 
cations with extrapolation requirements, the approximation of the yield was very 
good for all of the three data sets and not only for the trained data samples. 
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but also for the test data. The reason for this good results seems to be the high 
number of training samples and the relatively high noise, which has been applied 
in the training phase. 




Number of Sample 

Fig. 7. The result of the prediction with the RBF network and testing with a 
validation data set 



With BackProp (96-15-15-1) and (10-4-1), convergence of the training often 
was a problem and therefore we were not able to obtain a proper result. The 
tests with RProp (96-15-1) and (10-3-1) have been converged much faster and 
the resulting values have been much better, too; the results are in the range of 
the results of the Counterpropagation algorithm. 

If we compare the results of the training with the reduced data set to the 
training results with all of the 96 parameters we can see a small difference of 
1-2 % in the resulting yield. The reduction with PC A seems to deliver the better 
results for the prediction, but it’s much more difficult to find out the original 
parameters, which are important for the changes in the yield. You only have 
the eigenvectors of the covariance matrix. So we’re going to use the regression 
method for the reduction of parameters in the future. An overview of the results 
is given in Table 3. 



Table 3. Results of the prediction with different networks and different data 
samples. The absolute error of the prediction is given in percent 





Counterprop 


RBF 


BackProp 


RProp 


Samples 


Training 


Test 


Training 


Test 


Training 


Test 


Training 


Test 


96 parameters 


5.9 


9.1 


5.7 


8.2 


13.7 


13.9 


6.4 


9.3 


Regression 


6.7 


9.9 


6.2 


8.8 


10.1 


11.3 


7.2 


10.8 


PCA 


6.4 


9.1 


6.4 


8.6 


9.9 


11.1 


6.9 


10.2 
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6.4 Discussion 

In our work we were able to predict the functional yield of wafers with an abso- 
lute error of less than 8 %. This result is precise enough to detect automatically, 
whether many chips on a wafer are defective or not. We extracted some param- 
eters to have a small and clear yield model for the experts in the semiconductor 
factory. Checked by some experts the extracted parameters proved to be very 
important and most obviously responsible for yield variations in the given data 
samples. Within this work it was possible to forecast the yield of chips very early 
in the production process; additionally the result provides a tool for fast error 
recovering by the experts due to a small given number of responsible parameters. 

7 Real Time Processing of Nerve Signals for Controlling 
a Limb Prostheses 

ANNs can be used for the signal processing in the medical field as well. Signal 
processing within the medical field requires a high flexibility and good general- 
ization skills what can be obtained using ANNs. One application of ANNs in the 
medical held is the INTER-project. 

The aim of the INTER-project (/ntelligent TVeural InTE'Mace) is to investi- 
gate fundamental issues related to the design and fabrication of a new generation 
of microsystems applicable as neural prostheses. A global overview for a PNS- 
r emoted limb prostheses is given in [22] and is shown in Figure 8. 




Fig. 8. Scheme Configuration of a bio-neural controlled prostheses 



Nerve signals will be recorded and amplified by a regeneration-type neu- 
rosensor. Then, an artificial neural net (ANN) is applied which classifies the 
resulting signals in order to assign certain limb movements to the signal classes. 
A control unit uses the resulting information to regulate the movement of the 
prostheses [23,24]. 
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Ideally, the prostheses is equipped with sensors. Signals from the sensors 
will be processed by an ANN and transmitted via a signal generator and the 
neurosensor to the peripheral nervous system (PNS) resulting in a kind of natural 
limb control. 



7.1 The Neurosensor 

The principle of the implementation and the neurosensor which is used in the 
INTER-project is shown in Figure 9. Peripheral nerves of vertebrates will regen- 
erate if severed. For this reason, the peripheral nerve can be surgically severed in 
order to insert the proximal and the distal stump into a guidance channel which 
envelops the neurosensor. 





Fig. 9. Implementation scheme for the regeneration- type neurosensor 



The sensor is fabricated of polyimide perforated by multiple Via holes’ [25]. 
The axons regenerate through the via holes from the proximal stump towards 
the distal stump of the nerve. Nerve signals can be recorded by electrodes, which 
are enclosing some of the via holes. A circuitry amplifies and preprocesses the 
nerve signals. The amplified signals are transferred to the units controlling the 
prostheses as shown in Figure 8. 
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7.2 Data Set 

The data set, which is used for the classification, has been recorded by the Insti- 
tut fiir Biomedizinische Technik (IBMT). IBMT has chosen the stomatogastric 
nervous system (STNS) of the crab Cancer pagurus as described in [26]. 




Fig. 10. One detail out of the recordings from the gastric nerve of a crab. PD, 
LP and PY cells can be easily identified 



A typical recorded sequence of the signals are shown in Figure 10. The STNS 
contains about 30 nerve cell bodies, 24 of which are motorneurons and 6 of which 
are interneurons. The action potentials corresponding to the PD, LP and PY 
motorneurons can be easily identified. The durations of the recordings are 24 
respectively 40 seconds. The data set were recorded using a sample frequency of 
5 kHz. Since this nervous system is very well known, we are able to verify the 
results obtained by the classification of the SOM for their correctness. 



7.3 Classification Using Kohonens SOM 

For the classification of the data set a two dimensional SOM [3,4] with 10 neu- 
rons in both dimension have been applied. The training data set consists of 
2667 vectors with six components describing the shape of a spike of a neuron. 

After obtaining a well ordered map, we have identified the clusters within this 
map using CLUSOT (Cluster in self-organized maps) [27]. Each of the obtained 
clusters represent one specific signal from an axon respectively from a group of 
axons (e.g. PD or PY cells). Due to the identification of the clusters within the 
trained SOM, we can assign an action to each cluster conditioned by a signal of 
an axon. The obtained clusters are presented in Figure 11. 

As mentioned above, each cluster represents the signals from an axon respec- 
tively from a group of axons. Every time a nerve signals occurs, it will be classi- 
fied to its corresponding cluster. Thus we are able to recognize the signals from 
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Close prostheses 




Open prostheses 



Fig. 11. The trained SOM with obtained clusters. Two of the clusters has been 
chosen to assign an action of the prostheses 

certain axons in order to control the movement of the limb prostheses. In our 
case we control a commercial artificial hand assigned to the clusters as shown in 
Figure 11. 



7.4 Control Unit of the Prostheses 

After classifying the nerve signals to their corresponding clusters the occurance 
of a nerve signal of a certain cluster must be assigned to its corresponding action. 
Since the information of the nerve signals are pulse-frequency encoded we have to 
change into the time domain. The problem within this case is that the occurance 
of one single nerve signal does not carry useful information. This might be a 
spontaneous or hazardous signal. For these reasons we have decided to build an 
integration based signal interpreter to remote the control unit of the prostheses. 
A detailed description of the control unit is given in [24]. 



7.5 Discussion 

In this chapter, we have presented a signal processing unit basing on artificial 
neural networks which is able to interprete real nerve signals and to control a 
limb prostheses in real time. The signal processing system consists of a self- 
organizing map (SOM) and an integration based signal interpreter. The SOM 
classifies the nerve signal corresponding to its origin. The integration based signal 
interpreter controls the movement of the limb prostheses based on the frequency 
of occurance of the nerve signals. The two parts of the processing unit realize a 
pulse- frequency decoding of nerve signals. 

To summarize, we have presented a real time processing system which is able 
to remote a limb prostheses due to incoming nerve signals. We have shown that 
it is possible to interprete nerve signals from recordings of a sum of axons in real 
time. 
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8 Conclusion 

After a brief introduction to artificial neural nets, we have presented different 
applications of data and signal processing covering a wide range of different 
domains from industrial large-scale production up to health care. The presented 
applications were color restoration, gas sensing systems, internet information 
search and delivery, online quality control and nerve signal processing. All signal 
processing problems within these applications could be solved by the use of 
ANNs whereas we have shown, that different types of ANNs have to be chosen 
for different requirements. 

To conclude, we have presented solutions for various data and signal pro- 
cessing problems. We have shown, that artificial neural nets can be applied not 
only to academic problems but also to industrial applications. This is due to 
their abilities like learning of coherences using data samples, parallelism and its 
insensitivity to noisy data. Especially in the case of nonlinear coherences within 
the data samples ANNs are extremely advantageous. 
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Abstract. We introduce a new automaton on a word p, sequence of let- 
ters taken in an alphabet V, that we call factor oracle. This automaton is 
acyclic, recognizes at least the factors of p, has m + 1 states and a linear 
number of transitions. We give an on-line construction to build it. We 
use this new structure in string matching algorithms that we conjecture 
optimal according to the experimental results. These algorithms are as 
efficient as the ones that already exist using less memory and being more 
easy to implement. 

Keywords: indexing, finite automaton, pattern matching, algorithm 
design. 



1 Introduction 

A word p is a finite sequence p = piP 2 • • - Pm of letters taken in an alphabet U. 
We keep the notation p along this paper to denote the word on which we are 
working. 

Efficient pattern matching on fixed texts are based on indexes built on top of 
the text. Many indexing techniques exist for this purpose. The simplest methods 
use precomputed tables of g-grams while more achieved methods use more elab- 
orated data structures. These classical structures are: suffix arrays, suffix trees, 
suffix automata or DAWGs^, and factor automata (see [11]). When regarded as 
automata, they accept the set of factors (substrings) of the text. All these struc- 
tures lead to very time-efficient pattern matching algorithms but require a fairly 
large amount of memory space. It is considered, for example, that the implemen- 
tation of suffix arrays can be achieved using five bytes per text character and 
that other structures need about twelve bytes per text character. 

Several strategies have been developed to reduce the memory space required 
to implement structures for indexes. 

Work by this author is supported in part by Programme “Genomes” of G.N.R.S. 

^ DAWGs, Directed Acyclic Word Graphs, are just suffix automata in which all states 
are terminal states 

J. Pavelka, G. Tel, M. Bartosek (Eds.): SOFSEM’99, LNCS 1725, pp. 295-310, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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One of the oldest method is to merge the compression techniques applied 
both by the suffix tree and the suffix automaton. It leads to the notion of com- 
pact suffix automaton (or compact DAWG) [5]. The direct construction of this 
structure is given in [12,13]. 

A second method to reduce the size of indexes has been considered in the text 
compression method in [10]. It consists in representing the complement language 
of the factors (substrings) of the text. More precisely, only minimal factors not 
occurring in the text need to be considered [9,8]. Which allow to store them in 
a tree and to save space. 

We present in this paper a third method. We want to build an automaton (a) 
that is acyclic (b) that recognizes at least the factors of p (c) that has the fewer 
states as possible and (d) that has a linear number of transitions. We already 
notice that such an automaton has necessarily at least m + 1 states. 

The suffix or factor automaton [4,7] satffies (a)-(b)-(d) but not (c) whereas 
the sub-sequence automaton [3] satisfies (a)-(b)-(c) but not (d), which makes 
the problem non trivial. 

We propose an intermediate structure that we call the factor oracle: an 
automaton with m -|- 1 states that satisfies these four requirements. 

We use this new structure to design new string matching algorithms. These 
algorithms have a very good average behaviour that we conjecture as optimal. 
The main advantages of these new algorithms are (1) that they are easy to imple- 
ment for an optimal behaviour and (2) the memory saving that the factor oracle 
allows with respect to the suffix automaton. The structure has been extended 
in [2] to implement the index of a finite set of texts. 

The paper is structured as follows: Section 2 discusses the construction of the 
factor oracle. Section 3 describes a string matching based on the factor oracle 
and shows experimental results, and finally we conclude in Section 4. Proofs of 
the results presented in the paper may be found in [1]. We now define notions 
and definitions that we need along this paper. 

A word X G is a factor of p if and only if p can be written p = uxv with 
u^v G i7*. We denote Fact{p) the set of all the factors of word p. A factor x of 
p is a prefix (resp. a sujfix) of p if p = xu (resp. p = ux) with G i7*. The set of 
all the prefixes of p is denoted by Pref{p) and the one of all the suffixes Suff{p). 
We say that x is a proper factor (resp. proper prefix, proper suffix) of p if x is a 
factor (resp. prefix, suffix) of p distinct from p and from the empty word e. 

We denote pref ^{i) the prefix of length i of p for 0 < i < |p|. 

We denote for u G Fact{p), poecur{u,p) = min{| 2 :| : z = wuetp = wuv}, the 
ending position of the first occurrence of u in p. 

Finally, we define for u G Faet{p) the set endpos^{u) = {i \ p = 
wupij^i . . .Pm}‘ If two factors u and v of p are such that endpos^{u) = 
endpoSp{v), we denote u v. It is very easy to verify that is an equiv- 
alence relation; it is in fact the syntaxic equivalence of the language Suff{p). 
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2 Factor Oracle 

2.1 Construction Algorithm 



Build_Oracle(p = pip2 • • -Pm) 

1. For i from 0 to m 

2. Create a new state i 

3. For i from 0 to m — 1 

4. Build a new transition from z to z + 1 by Pi-\-i 

5. For z from 0 to m — 1 

6. Let zz be a minimal length word in state z 

7. For all cr G A’, cr 7 ^ pi-^i 

8. If ua e Fact{pi_\u\+i • • - Pm) 

9. Build a new transition from z to z + poccur(ua,Pi_\u\^i . . .pm) by a 



Fig. 1. High-level construction algorithm of the Oracle 



Definition 1 The factor oracle of a word p = piP 2 • • • Pm the automaton build 
by the algorithm Build.Oracle (Figure 1) on the word p, where all the states are 
terminal. It is denoted by Oraele{p). 

The factor oracle of the word p = abbbaab is given as an example Figure 2. 
On this example, it can be noticed that the word aba is recognized whereas it is 
not a factor of p. 




Fig. 2. Factor oracle of abbbaab. The word aba is recognizes whereas it is not a 
factor 



Note: all the transitions that reach state i of Oraele{p) are labeled by pi. 

Lemma 1 Let zz G A’* be a minimal length word among the words reeognized in 
state i of Oraele{p). Then, u G Faet{p) and i = poeeur{u,p). 
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Corollary 1 Let G be a minimal length word among the words reeognized 
in state i of Oraele{p), u is unique. 

We denote min(i) the minimal length word of state i. 

Corollary 2 Leti and j he two states of Oraele{p) sueh as j < i. Letu = min(i) 
and V = min(jf)^ u ean not he a sujfix of v. 



Lemma 2 Let i he a state of Oraele{p) and u = min(i). u is a suffix of any 
word c G whieh is the label of a path leading from state 0 to state i. 



Lemma 3 Let w G Faet{p). w is reeognized by Oraele{p) in a state j < 
poeeur{w,p). 

Note: In lemma 3, j is really less or equal than poeeur{w, p), and not always equal. 
The example given in the Figure 3 represents the automaton Oraele{ahhcahc) ^ 
and the state reached after the reading of the word ahc is strictly less than 
poeeur{abc^ abbcabc). 







b b 






Fig. 3. Example of a factor (ahc) that is not recognized at the end of his first 
occurrence but before 



Corollary 3 Let w G Faet{p). Every word v G Suff{w) is reeognized by 
Oraele{p) in a state j < poeeur{w). 



Lemma 4 Let i be a state of Oraele{p) and u = min(i). Any path ending by u 
leads to a state j > i. 



Lemma 5 Let re G T’* be a word reeognized by Oraele{p) in i, then any suffix 
of w is reeognized in a state j <i. 

The number of states of Oraele{p) with p = pip 2 . . .Pm is m + 1. We now 
consider the number of transitions. 

Lemma 6 The numberTor{p) of transitions in Oraele{p = p\p 2 • • .Pm) satisfies 
"m < Tor{p) < 2m — 1. 
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2.2 On-line Algorithm 

This section presents an on-line construction of the automaton Oracle{p), that 
means a way of building the automaton by reading the letters of p one by one 
from left to right. 

We denote repet ^{i) the longest suffix of pref that appears at least twice 
in prefpii). 

We define a function Sp defined on the states of the automaton, called supply 
function, that maps each state i > 0 of Oraele{p) to state j in which the reading 
of repet p{i) ends. We arbitrarily set 5'p(0) = —1. 

Notes: 

— Sp{i) is well defined for every state i of Oraele{p) (Corollary 3). 

— For any state i of Oraele{p), i > Sp{i) (lemma 3). 

We denote = m, ki = Sp{ki-i) for i > 1. The sequence of the ki is finite, 
strictly decreasing and ends in state 0. We denote 

CSp = {ko = m, /ci, . . . ,kt =0} 

the suffix path of p in Oraele{p). 

Lemma 7 Let k > 0 be a state of Oraele{p) sueh that s = Sp{k) is strietly 
positive. We denote Wk = repetp{k) and Wg = repetp{s). Then Wg is a suffix 
ofwk. 



Corollary 4 Let CSp = {koffii . . . , /ct = 0} be the suffix path of p in Oraele{p) 
and let Wi = repetp{ki-i) for 1 < i < t and wq = p. Then, for 0 < I < t, wi is a 
suffix of all the Wi, 0 < i < I < t. 

We now consider for a word p = piP 2 • • • Pm and a letter cr G A’ the construc- 
tion of Oraele{pa) from Oraele{p). 

We denote Oraele{p) -h cr the automaton Oraele{p) on which a transition by 
a from state m to state m + 1 is added. We already notice that a transition that 
exists in Oraele{p) ffi a also exists in Oraele{pa), so that the difference between 
the two automata may only rely on transitions by a to state m-\-l that have to 
be added to Oraele{p) H- a in order to get Oraele{pa). 

We are investigating states from which there may be transitions by a to state 
m + 1. 

Lemma 8 Let k be a state of Oraele{p) + a sueh that there is a transition from 
k by a to m in Oraele{pcr) . Then k has to be one of the states on the suffix 
path CSp = {ko = . . . , /ct = 0} in Oraele{p) + a. 

Among the states on the suffix path of p, every state that has no transition 
by a in Oraele{p) + cr must have one in Oraele{pcr). More formally, the following 
lemma sets this fact. 
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Lemma 9 Let ki < m be a state on the suffix path CSp = {ko = m, /ci, . . . , 
kt = 0} of state m in Oraele{p = pip 2 • • .J9m) + cr* Uh does not have a transition 
by a in Oraele{p), then there is a transition by a from ki to m-\-l in Oraele{pa) . 



Lemma 10 Let ki < m be a state on the suffix path CSp = {ko = m, 
ki, ... = 0} Oraele{p = piP 2 • • - Pm) + cr* If ki has a transition by a 

in Oraele{p) + a, then all the states ki^ 0 < i <t also have a transition by a in 
Oraele{p) + a. 

The idea of the on-line construction algorithm is the following. According to 
the three lemmas 8, 9, 10, to transform Oraele{p) + cr in Oraele{pa) we only 
have to go down the suffix path CSp = {ko = . . . , /ct = 0} of state m and 

while the current state ki does not have an exiting transition by cr, a transition 
by <j to m + 1 should be added (lemma 9). If ki already has one, the process 
ends because, according to lemma 10, all the states kj after ki on the suffix path 
already have a transition by a. 

If we only wanted to add a single letter, the preceding algorithm would be 
enough. But, as we want to be able to build the automaton by adding the letters 
of p the one after the other, we have to be able to update the supply function 
Spa- of the new automaton Oraele{pcr). As (according to the definition of 5'^), 
the supply function of states 0 < i < m does not change from Oraele{p) to 
Oraele{pa) , the only thing to do is to compute Spa{m + 1). This is done with 
the following lemma. 

Lemma 11 If there is a state k^ whieh is the greatest element of CSp = {ko = 
m, /ci, . . . , /ct = 0} in Oraele{p) sueh that there is a transition by a from kd to a 
state s in Oraele{p), then Spa{m 1) = s in Oraele{pa) . Else Spa = 0. 

From these lemmas we can now deduce an algorithm addJetter to transform 
Oraele{p) in Oraele{pa) . It is given in Figure 4. 

Lemma 12 The algorithm add-letter really builds Oraele{p = piP 2 ---PmO') 
from Oraele{p = p\p 2 . . .Pm) cind update the supply funetion of the new state 
m + 1 o/ Oraele{pa) . 

The complete on-line algorithm to build Oraele{p = p\p 2 • . .Pm) just consits 
in adding the letters pi one by one from left to right. It is given in Figure 5. 

Theorem 1 The algorithm Oracle-on-line (p = piP 2 • • - Pm) builds Oraele{p). 



Theorem 2 The eomplexity o/Oracle-on-line(p = piP 2 • • - Pm) 0{m) in time 
and in spaee. 
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Fonction add_letter( Orac/e(p = p±p 2 . . - Pm), cr) 

1. Create a new state m + 1 

2. Create a new transition from m to m + 1 labeled by a 

3. k <— Sp{m) 

4. While k > —1 and there is no transition from k hy a Do 

5. Create a new transition from k to m 1 hy a 

6. k^ Sp{k) 

7. End While 

8. If (/c = -l) Thens^O 

9. Else s ^ where leads the transition from k hy a. 

10. Spa{m + 1) ^ s 

11. Return Oracle{p = pip 2 • • -Pmcr) 



Fig. 4. Add a letter a to Oracle{p = p\P2 ■ ■ -Pm) to get Oracle{pa) 



Oracle- 


-on-line(p = pip2 . . .pm) 


1 . 


Create Oracle{e) with: 


2. 


one single state 0 


3. 


^e(O) ^ -1 


4. 


For i ^ 1 k m Do 


5. 


Oracle{p = pip2 ■ ■ - Pi) ^ add_letter( Orac/e(p = pip2 • • -Pi-i),Pi) 


6. 


End For 



Fig. 5. On-line construction algorithm of Oracle{p = pip 2 . . .Pm) 



Note The constants involved in the asymptotic bound of the complexity of the 
on-line construction algorithm depend on the implementation and may involve 
the size of the alphabet N. If we implement the transitions in a way that they 
are accessible in 0(1) (use of tables), then the complexity is 0{m) in time and 
0{\N\ • m) in space. If we implement the transitions in a way that they are 
accessible in 0{log\'E\) (use of search trees), then the complexity is 0{log\'E \ -m) 
in time and 0{m) in space. 

Example The on-line construction of Oraele{abbbaab) is given in Figure 6. 

3 String Matching 

The factor oracle of p can be used in the same way as the suffix automaton in 
string matching in order to find the occurrences of a word p = piP 2 • • • Pm in n 
text T = tit 2 • • - tn both on an alphabet E. 
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(e) Add b 




(f) Add a 




(g) Add a 




(h) Add b 



Fig. 6. On-line construction of Oracle{abbaba) . The dot-lined arrows represent 
the supply function 

The suffix automaton is used in [14,11] to get an optimal algorithm in the 
average called BDM (for Backward Dawg matching). Its average complexity is in 
0(nlog|j;|(m)/m) under a Bernouilli model of probability where all the letters 
are equiprobable. 

The BDM algorithm move a window of size m on the text. For each new 
position of this window, the suffix automaton of p'^ (the mirror image of p) is 
used to search for a factor of p from the right to the left of the window. 

The basic idea of the BDM is that if this backward search failed on a letter 
a after the reading of a word u then au is not a factor of p and moving the 
beginning of the window just after a is secure. This idea is then refined in the 
BDM using some properties of the suffix automaton. 
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Window 



















Search in 0 


racle 


u 




1 1 1 1 1 1 1 <7 








Search fails in a: 


New sei 


Irch 


M 


1 1 1 1 1 1 1 1 1 1 1 


r 


Window shift 







Window 



Fig. 7 . Shift of the search window after the fail of the search by Oracle{p). The 
word au is not a factor of p 



However this idea is enough in order to get an efficient string matching algo- 
rithm. The most amazing is that the strict recognition of the factors (that the 
factor and suffix automata allow) is not necessary. For the algorithm to work, it 
is enough to know that ua is not a factor of p. The oracle can be used to replace 
the suffix automaton as it is illustrated by Figure 7. We call this new algorithm 
BOM for Backward Oracle Matching. The pseudo-code of BOM is given in Fig- 
ure 3. Its proof is given lemma 13. We make the conjecture (according to the 
experimental results) that BOM is still optimal in the average. 

Lemma 13 The BOM algorithm marks all the occurrences of p in T and only 
them. 

The worst-case complexity of BOM is 0{nm). However, in the average, we 
make the following conjecture based on experimental results (see 3.2): 

Conjecture 1 Under a model of independance and equiprohahility of letters, the 
BOM algorithm has an average complexity of 0{n\og^jj^{m) / m) . 



3.1 A Linear Algorithm in the Worst Case 

Even if the preceding algorithms are very efficient in practice, they have a worst- 
case complexity in 0{mn). There are several techniques to make the BDM algo- 
rithm (using suffix automaton) linear in the worst case, and one of them can 
also be used to make our algorithms linear in the worst case. It uses the Knuth- 
Morris-Pratt (KMP) algorithm to make a forward reading of some characters in 
the text. 
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BOM(p 


— PlP2 . . .Pm, T — tit2 ...tn) 


1. 


Pre-processing 


2. 


Construction of the oracle of p^ 


3. 


Search 


4. 


pos ^ 0 


5. 


While {pos <= n — m) do 


6. 


state ^ initial state of Oracle (p^) 


7. 


j ^ m 


8. 


While state exists do 


9. 


state ^ image state by T\pos + j] in Oracle{p^) 


10. 


3^3-^ 


11. 


EndWhile 


12. 


If j = 0 do 


13. 


mark an occurrence at pos + 1 


14. 


i ^ 1 


15. 


Endlf 


16. 


pos ^ pos -\-j 


17. 


EndWhile 



Fig. 8. Pseudo-code of BOM algorithm 



To explain the combined use of KMP and (factor or suffix) oracle, we consider 
the current position before the search with the oracle: a prefix v of the pattern 
has already be read with KMP at the beginning of the search window and we 
start the backward search using the oracle from the right end of that current 
window. The end position of v in the current window is called critical position 
and is denoted by Critpos. The current position is schematized in Figure 9. 





Window 




V 

















Prefix of the pattern 



Search with oracle 



Critical position 
Critpos 



Fig. 9. Current position in the linear algorithm using both KMP and (factor or 
suffix) oracle 



We use the search with the oracle from right to left from the right end of the 
window. We consider two cases whether the critical position is reached or not. 
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1. The critical position is not reached. The failure of the recognition of a factor 
occurs on character a as in the general approach (Figure 7). We shift the 
window to the left until its beginning goes past character a. We restart a 
KMP search on this new window rereading the characters already read by the 
oracle. This search stops in a new current position (with a new corresponding 
critical position) when the recognized prefix is small enough (less than am 
with 0 < (a < 1). The value of a is discussed with the experimental results 
(see Section 3.2), typically a = 1/2. This situation is schematized Figure 10. 






1 1 1 1 1 1 1 


MM 


Window shift 


Search by KMP algorithm 





Window 



a 



End of the search by KMP 
Back to the current position 

Fig. 10. First case: the critical position is not reached 



v' Critpos' 






2. The critical position is reached. We resume the KMP search from the critical 
position, from the state we were before stopping, rereading at least the char- 
acters read by the oracle. We then go on reading the text until the longest 
recognized prefix is small enough (less than a). This situation is schematized 
Figure 11. 

This algorithm can be used with a backward search done with the factor 
oracle. We call this new algorithm Turbo-BOM. Concerning the complexity in 
the worst case, we have the following result. 

Theorem 3 The algorithm Turbo-BOM is 

(i) linear eonsidering the number of inspeetions of eharaeters in the text. 

The number of these inspeetions is less than 2n. 

(a) linear eonsidering the number of eomparisons of eharaeters. The number 
of these eomparisons is less than 2n when the transitions of the oraele 
are available in 0(1) and less than 2n-\-n\og U when the transitions are 
available in log E. 
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Re-reading by KMP 





Window 



End of the search with KMP 
Back to the current position 




v' Critpos' 

Window 



Fig. 11. Second case: the critical position is reached 



3.2 Experimental Results 

In this section, we present the experimental results obtained. More precisely, we 
compare the following algorithms. 

— Sunday: the Sunday algorithm [15] is often considered as the fastest in 
practice, 

— BM: the Boyer-Moore algorithm [6], 

— BDM: the classical Backward Dawg Matching with a suffix automaton [11], 

— Suff: the Backward Dawg Matching with a suffix automaton but without 
testing terminal states, this is equivalent to the basic approach with the 
factor automaton^, 

— BOM: the Backward Oracle Matching with the factor oracle, 

— BSOM: the Backward Oracle Matching with the suffix oracle. This later 
structure is not described in this version of the paper, but can be found 
in [1], 

— Turbo-BOM: the linear algorithm using BOM and KMP with a = 1/2. 

Our string matching experiments are done on random texts of size 10 Mb with 
an accuracy of ±2 % with a confidence of 95 % (which may require thousands of 
iterations) for alphabets of size 2, 4, 16 and 32. The machine used is a PC with a 

^ The suffix automaton without taking in account the terminal states (i.e. considering 
every state as terminal) and the factor automaton recognize the same language. The 
difference is that the factor automaton is minimal, so its size is smaller or equal 
than the size of the suffix automaton. But the difference of size is not significant 
in practice, anyway not enough significant to justify the implementation of a factor 
automaton which will complicate and slow the preprocessing phase of the string 
matching algorithm. 
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Pentium II processor at 350 MHz running Linux 2.0.32 operating system. For all 
the algorithms, the transitions of the automata are implemented as tables which 
allow 0(1) branches. But it is not realistic (especially for the suffix automaton) 
when the alphabet becomes rather big (for instance for 16 bits character coding). 
Moreover, the Sunday algorithm becomes unusable as it is when the alphabet is 
big because it mainly uses character table. 

Experimental results in string matching are always surprising because codes 
are small and the time taken by a comparison is not much greater than the time 
taken by an indice incrementation. It is for instance the reason why Sunday 
algorithm (when it is usable) is the fastest algorithm for small patterns. The 
window shift are very small but very few operations are necessary to get this 
shift. It is also the reason why BDM is slower than Suff whereas the window 
shifts in BSOM and BDM are greater. 

The 4 subfigures of Figure 12 shows that BOM is as fast as Suff (except on 
a binary alphabet) which is much more complicated and requires much more 
memory. 

It is obviously useless (in the case of searchs in texts of characters) to mark 
and test terminal states in both suffix automaton and factor oracle. 

Turbo-BOM algorithm is the slowest but it is the only one that can be used 
in real time and in that case its behavior is rather good. It has to be noticed 
that we arbitrarily set the value of a to 1/2. However, according to the tests we 
have proceeded for different values of a, it turns out that a = 1/2 is the more 
often the best value and that the variations of search times with other values of 
a (as far as they stay between (21og|^| m)jm and (m — 21og|^| m)jm) are not 
very significant and anyway do not deserve by themselves an accurate study. 

4 Conclusions 

The new structure we presented, the factor oracle^ allows new string matching 
algorithms. These algorithms are very efficient in practice, as efficient as the 
ones which already exists, but are far more simple to implement and require 
less memory. According to the experimental results, we conjecture that they 
are optimal on the average (under a model of equiprobability of letters) but it 
remains to be shown. 

About the structure of factor oracle itself, many questions stay open. Among 
others, it would be interesting to have a characterization of the language recog- 
nized by the oracle. 

It would also be interesting to have a study of the average number of external 
transitions in the oracle. It would give an idea of the average memory space 
required by the string matching algorithms. 

Finally, we notice that the factor oracle is not minimal considering the num- 
ber of transitions among the automata of m -h 1 states which recognize at least 
the factors. An example is given in Figure 13. This reduced automaton may also 
be used in string matching provided that its construction can be done in linear 
time. This construction remains an open problem. 
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BM Sunday BDM BSOM 

BOM Turbo-BOM Suff 

Fig. 12. Experimental results in time of the string matching algorithms on ran- 
dom texts of size 10 Mb on alphabets of size 2,4, 16 and 32. The X-axis represents 
the length of the pattern and the Y-axis the search time in 1/lOOth seconds per 
Mbyte 
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(a) Factor oracle 




b 



(b) Reduced automaton 



0 



Fig. 13. The factor oracle is not minimal considering the number of transitions 
among the automata of m + 1 states which recognize at least the factors 
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Abstract. Forecasting procedures are needed when there is uncertainty 
about the future. In our contribution we discuss some principles that can 
help to make more accurate forecasts and help to better assess the uncer- 
tainty associated with forecasts. We mainly discuss statistical forecasting 
procedures, but other principles based on experts (judgmental forecast- 
ing) and integrating and combining approaches are also mentioned. We 
show some results of forecasting in two different application areas. 



1 History 

Forecasting represents an important phenomenon for all individual beings, com- 
munities and societies. Importance of forecasting of the future has been recog- 
nized since very ancient times. Antique prophets were celebrated for their suc- 
cessful predictions and were cursed when their forecasts failed. During the cen- 
turies different forecasting methods have been developed, improved and applied. 
For example, Ptolemaios concept of the Universe was developed 1900 years ago 
and astronomers could predict the most significant sights in the sky. Later, the 
systematic errors were identified by astronomers using many and many observa- 
tions. The Coper nican concept of the Universe represented the significant inno- 
vation allowing astronomers to predict the movement of the stars with fascinat- 
ing accuracy. Modern astronomy is, of course, more accurate than Copernican 
astronomy. Similar progress can be monitored in the theory of motion with Aris- 
totle, Galileo, Newton and Einstein concepts. 

Eorecasting during the first six decades of this century were oriented to 
“judgmental” or “by hand” forecasts, numerical calculations were difficult and 
extremely time-consuming, and practical applicability of such approaches were 
limited to simple time series. Nevertheless, many useful forecasting knowledge 
has been gained. Gordon [23] found that the average judgment from groups of 
human forecasters was substantially more accurate than those from the typical 
individual expert. Ogburn [33] and MacGregor [30] found that judgmental fore- 
casts were strongly influenced by the desired outcome (optimism bias). Some 
experts concluded [18] that forecasters should use as long a time series as pos- 
sible. The principle sometimes conflicts with the principle of using the most 
relevant data, which typically means the most recent data. Jarvik [26] identified 
the “gambler’s fallacy” , whereby people expect that earlier outcomes will affect 
the next one in games of chance. This phenomenon can be observed in many 
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other areas of human behavior. In Brown’s [9] extrapolation model, the histor- 
ical data were weighted with exponentially decreasing weights. This produces 
more accurate prediction for rapidly changing (nonstationary) data. Ferber [19] 
declared that the fit error of a model was a poor criterion for its forecast abilities. 
This was a new message for many statisticians in that time. Theil [52] developed 
useful measures for assessing time series forecast errors. His ^/-statistics allows 
a relative comparison of different forecasting methods. 

Since 1960 the wide advent of computing machines has opened a much wider 
area for forecasting methods. The increasing computing power enables not only 
to quantitative extension of previously known principles, but principally new 
approaches were developed and applied. 



2 Why Forecast? 

Regadless of the progress in forecasting methods during last decades, two impor- 
tant point should be mentioned. The first point is that sucessful forecasting is 
not always directly useful to managers and other users. (Jules Verne correctly 
predicted submarines, travel to the moon etc., but there were no help to engi- 
neers, how to construct such inventions). The second point is that the forecast- 
ing procedures are needed only if there is uncertainty about the future. There 
is no uncertainty when one can control events. For example, we do not need to 
predict the temperature in the refrigerator. Forecasting itself is only one step 
(following the data collecting, archiving and checking). The forecasting results 
form the inputs for the subsequent planning and decision making step. Schemat- 
ically, three steps of the whole process are shown in Fig. 1. Of course, some 
more complicated scheme with feedback connections can be taken into account. 
If the outputs are not satisfactory, the plans can be revised. This process can 
be repeated until forecasts are satisfactory. Revised plans are then implemented 
and actual outputs are monitored for use in the next planning period. 




Fig. 1. Forecasting role in the planning and decision making process 
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It is now necessary to emphasize the difference between forecasting and plan- 
ning. Whereas planning is concerned with what the world should look like, fore- 
casting is concerned with what it will look like. However, in practice, many orga- 
nizations revise the forecasts, not the plans. This is done with the belief that 
forecasts affect behavior. Sometimes it can be true. For example, the expected 
exchange rate between USD and EURO (or CZK) can affect behaviour of dealers. 

3 Economical and Legal Aspects of Forecasting 

The forecasting procedures are very useful in many areas of human activ- 
ities including medicine, economy, energy industry, banking, telecommunica- 
tions, transportation, environment etc. Organizations invest enormous amount 
of money on forecasts about new products, factories, retail outlets etc. The com- 
panies expect that their investment into more accurate forecasting technologies 
will bring more profit in the future. In many cases a lot of money can be save 
when forecasters improve their predictions. It has been published in [10], that 
for electric companies in the UK, an incerase of 1 % in forecasting error of the 
electric load would be associated with an increase in operational costs about 
10 millions British pounds per year (prices from 1984). What is happenned, if 
the forecast is wrong and economic or other losses are enormous? Here are some 
cases cited in [5]: 

^‘Four Massachusetts fishermen were lost at sea on November 21, 1980 
because, their families claimed, of an incorrect weather forecast. Three 
families brought suit and won an initial judgment on the ground that the 
National Weather Service was negligent in failing to repair a weather 

buoy that might have provided useful data The key issue in 

this case was not the fact that the forecast was wrong, but whether the 
National Oceanic and Atmospheric Administration (NOAA) failed to 
take reasonable steps to obtain accurate data. Another issue was that 
when key information was no longer available, NOAA did not notify the 
users of the forecast 

^Tn a British case, Esso Petroleum vs. Mardon (London, 1966 E. 
no. 2571 ), Mardon entered into a eontraet with Esso to own and operate 
a gas station. A eritical part of the negotiations was the foreeast that the 
station would sell 200,000 gallons of gas per year by the third year. The 
actual sales fell well short of the forecasted figure, and Mardon went out 
of business. Esso sued Mardon for unpaid bills. Mardon then countersued 
on the basis that the Esso foreeast misrepresented the situation. In effeet, 
Esso had originally forecast the 200,000 gallon figure under the assump- 
tion that the gas pumps would face the road. After a zoning hearing, they 
were foreed to ehange the design so that the pumps were not visible from 
the road. Despite this significant unfavorable ehange, Esso then used the 
200,000 gallon forecast in the original contract with Mardon. Mardon 
won; the court concluded that Esso had misrepresented the situation. 
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“In Beecham vs. Yankdovich, Beecham alleged that an inaeeurate market 
foreeast prepared by Yankelovieh Claney Shulman resulted in a $ 2f mil- 
lion loss. Yankelovieh, on the other hand, elaimed that Beeeham pro- 
vided ineorreet inputs to the foreeasting models, and that they failed to 
follow the marketing plan; for example, they ehanged advertising elaims 
and redueed promotional expenses (Adweek^s Marketing Week, 7 Deeem- 
ber 1987, pp. 1,4).’’ 

These cases imply that if there is no special contract between the forecaster 
and the user, forecaster are unlikely to be held liable. In all cases forecasts 
contain uncertainty, and reasonable balance between loses and benefits should 
be taken into account. Forecasters can be held liable if it can be shown that the 
forecasts were not obtained by reasonable practice, and only if the poor practice 
influences the results. 

4 The Forecasting Process 

There are many forecasting methods and the application of the concrete one 
depends upon the situation. For example, for long-range forecasting of the mar- 
ket, econometric methods are often appropriate. For short-range forecasting of 
costs, sales, or market share, extrapolation methods are useful. There is no uni- 
versal (the best) foreeasting method, which can be applied in all areas and for 
all prediction horizons. In this part we show general aspects of the forecasting 
methodology which can lead to the development of the “optimal” prediction 
models. Schematically, the whole forecasting process is shown in Fig. 2. 



4.1 Problem Identification and Description 

The first step in forecasting is to specify the prediction problem. The definition 
of the problem is sometimes the most difficult aspect. It is necessary to deep 
understand how the forecast will be used, who requires the forecasts, which 
variables will be predicted, how long the forecasting horizon is, which data is 
available in the proper (e.g. digital) form, what is the nature of processed data 
and how it might be transformed. It is worth to spend some time talking to 
experts who are involving in collecting data, forming the database, using the 
forecasts for the planning and decision making process. Forecasters should know 
as much as possible about the whole concept of the building systems. 



4.2 Relevant Input Variables Selection 

This aspect of the forecasting methodology is very important and the proper 
selection of suitable input variables has a great effect on the final prediction 
results. Correlation analysis can be used when linear models are good candi- 
dates. But, the variable selection for nonlinear models is very difficult. Cor- 
relation analysis cannot be used, because it only detects linear dependencies. 



Principles of Forecasting - A Short Overview 



315 




Fig. 2. Aspects of the forecasting methodology 

One approach is to apply the statistical tests for detecting the nonlinearities. 
Some tests can be found in [54] and [53]. Other interesting approaches, based 
on mutual information between two and more variables, are presented in [16] 
and [37]. These methods can help in proper variable selection as well as helping 
in decision, if an linear or nonlinear model is necessary. Another usefull approach 
is based on a heuristic selection of possible input variables and on the compar- 
ison of results from the different models. Any apriori information (judgmental 
or received from key persons) is valuable during this step (e.g. we apriori know 
that outdoor temperatures influence the electric consumption). This heuristic 
approach can be combined with some searching techniques, as are, for example, 
genetic algorithms [34]. 
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4.3 Prediction Quality Specification 

Because there is not a single universally accepted measure of accuracy in time 
series forecasting, the aspect of the prediction error specification should be taken 
into account very carefully. The mean squared error function (MSE) or mean 
absolute percentage error function (MAPE) are the most popular criteria for 
the prediction quality evaluation. In some cases the maximal errors or the whole 
prediction error distribution plays a more significant role than the usual MSE or 
MAPE values. Theil’s [/-statistics, the Durbin- Watson statistics or some other 
statistics can also be very useful measures (see [31]). The traditional fitting 
techniques are based on minimizing of the MSE function for data from the 
“training set” . Some other fitting methods, which penalized the higher prediction 
errors, can be applied to reduce the fatal forecasts. 

4.4 Choosing a Forecasting Model 

There is no universal forecasting method. Each model is based on a set of assump- 
tions and it involves one or more parameters which must be estimated using the 
known data sets. Selection must be made considering the characteristics of the 
data and the frequency of required forecasts. For example, if we need to forecast 
hundreds items on daily basis, there is no time to interact with prediction soft- 
ware to built new models. A simple and reliable automatic prediction system 
should be implemented in such case. There is a family of statistical parametric 
models, which try to model a relationship between the predicted values and the 
inputs from the past data. This family includes decomposition models, exponen- 
tial smoothing models, regression models, Box-Jenkins ARIMA models, regres- 
sion models with ARIMA errors, artificial neural networks, intervention models, 
state space models and others. If data include trend, seasonality or other “reg- 
ular” components, the decomposition model help us to estimate these parts. If 
there are strong correlations in the data, ARIMA model can successfully detect 
the linear relationships. Nonlinear dependencies can be modelled by artificial 
neural networks. Exponential smoothing can be applied for short-term extrapo- 
lations of “short” time series. Selection of a suitable prediction model is still in 
the “art” domain. 



4.5 Fitting Method Specification 

Ordinary least square, maximum likelihood and other approaches have been 
developed for estimating (fitting) parameters of the models from available data 
sets. Many other optimizing methods have been used for minimizing selected 
error functions (sometimes with penalty term) with respect to the model param- 
eters. The most popular metod for neural networks is the backpropagation algo- 
rithm which is a gradient steepest descent method. A number of modifications 
of this algorithm have been proposed. Among them, the second order methods 
such as BEGS and Levenberg-Marquardt methods are more efficient and can 
significantly reduce the fitting time and improve the prediction accuracy. In the 
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case a of non-continuous error function some other methods should be selected 
(e.g based on stochastic gradient methods). 

4.6 Fitting and Validation Data Specification 

The suitability of a particular forecasting method is measured by a selected 
error function. This function should not reflect the “goodness-of-fit” , but it must 
reflect an ability to perform the forecast in the future. So, the available data are 
split into the fitting and validation sets. It is crucial to have both parts as repre- 
sentatives. An inappropriate separation will affect the forecasting performance. 
Most authors select a specific part of the available data (e.g. 70 %) for fitting 
and the rest for the validation process. Other authors employ a bootstrap tech- 
nique [24]. Another factor is the sample size. If we have huge data sets, which 
part could be used of optimal model fitting? If there not enough data, which 
model can or cannot be estimated? No exact rule exists for the determination of 
the sample size for a given problem. But a larger sample can be required when 
a more complex relationship is modelled or the noise in the data increases. The 
selection of the sample size is also related to number of parameters. A heuristic 
rule often cited is: a sample size should be 10 times greater then the number of 
parameters in the model [55]. 

4.7 Model Architecture Optimizing 

The aim of the architecture optimizing process is to built the systems with bet- 
ter forecasting ability. The over-parametrized models can be reduced by special 
prunning techniques. Sensitivity and tolerance analysis can optimize the model 
structure respecting the principle of simplicity. This principle (principle of par- 
simony or Occam’s Razor principle) says that as few parameters as possible 
should be used in fitting a model to a set of data. The second approach of this 
optimizing step is based on combining predictions. When we have more than one 
forecasts, we can combine them to obtain another one. Experience shows, that, 
in the majority cases, such combining results in more accurate “post-sample” 
prediction than we obtained from the individuals. 



4.8 Testing 

When the forecasting models have been developed using available data sets, it 
is necessary to test them on new, previously unknown data. This process should 
be done before final implementation of the forecasting models. During this step, 
some “wrong” reactions on the “unexpected” events can be tuned. For example, 
missing data and outliers cause troubles. The comparison with expert judgments 
yields a good idea about the usefulness of our forecasting systems. The question 
when our system should be “retrained” should be also answered. In addition, 
the forecasting accuracy is not the only criterion for evaluation of our models. 
If the managers, using our forecasts, change their scenario, then the predictions 
will not be true. 
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4.9 Model Implementation and Using 

Implementing forecast models enabling to yield new information (reliable and 
in-time) to the management is often at least as important as the forecasts them- 
selves. Forecasters should provide summary of the forecasting methods and sup- 
porting data in a simple and understandable form. It is very valuable if the 
prediction intervals are presented together with the point forecasts. The fore- 
casting system should enable user-friendly on-line connections to data sources, 
data editing, preprocessing, parameter setting, forecasting, graphical represen- 
tations and statistical evaluations. 

5 Forecasting Formulas 

The k-step ahead forecast of the future M-dimensional time complex time series 
can be realized by a set of functions 



F(X,P,t + A:) = {Fi,^(Xi,Pi,),F2,^(X2,P2),... (1) 

where Pj,/c, j = 1, . . . , M are the functions forecasting the marginal variables. 
The vector Xj represents the input variables available at time t and Pj is a 
vector of parameters. For one-dimensional case and one- step ahead forecast of 
the time series T(t), the previous formula can be reduced to 



Y{t+1) = F{X,P). (2) 

Input variables can include the external (independent) variables as well as the 
internal inputs (the past values of T, other unobservable components as are the 
noise terms in ARMA models, etc.). If the function P is linear with respect to 
a set of inputs X = {xi,X 2 , . . . ,Xn}, then the forecast is realized by the linear 
regression formula 



T(t -h 1 ) = aixi + a2X2 H h + c, ( 3 ) 

where P = {ai, a2, . . . c}. If xi = t,X2 = . . . , then the forecast is 

realized by the polynomial regression 



Y {t 1) — ait + Ci2t^ + • • • + CLfit^ + c. (4) 

When we have xi = F(t), X 2 = T(t — 1), . . . , = F(t — n -h 1), the forecast is 

realized by the autoregressive (AR) model 



Y (t 1) — aiT (t) + (I 2 Y (t — 1) “h • • • “h CLfi^ (t — n 1) c. 



(5) 
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Parameters also could be changed, so we have a adaptive AR forecast with time- 
dependent (usually slowly varied) parameters 



Y {t 1) — ai(t)Y (t) + a 2 {t)Y (t — 1) + • • • + an{t)Y (t — n + 1) + c(t). (6) 

If we have X = {Y{t),Y{t)}, then a single exponential smoothing forecast is 
given by 



Y{t + l)=Yit) + a{Y{t)-Y{t)) (7) 

where a is a parameter from the interval (0, 1). This formula is equivalent to the 
autoregresive forecast with infinite number of exponentially decreasing parame- 
ters. 

For nonlinear case, the neural networks (three layered perceptron) can model 
the relationship by the function 



F{x) = f 

yi=o \i=i 




where / and fi are the sigmoidal functions of the form 



(8) 



= TT^ (9) 

where a, 6, Vj are parameters and Xj are components of the input vector X. 

Of course, many other formulas can be derived. Some models are theoretically 
equivalent (or asymptotically equivalent ) under some reasonable conditions. For 
example, it has been proved that any continuous function can be approximated 
by the three layered perceptron. But, it would be queer to apply neural networks 
for modelling of the polynomial regression, which is generally nonlinear, but it is 
linear in parameters. Another example is that the single exponential smoothing 
model with one parameter can be expressed as the AR process with infinite order 
(infinite number of parameters). Estimation of one parameter only is much more 
pretty procedure. 



6 Prediction Intervals 

In many application areas it is valuable to provide not only a point forecast 
but also to some limits for possible “worst” (or best) values. It can be done in 
the form of the prediction interval (or sets of forecast limits). The deriving of 
such interval is based on statistical theory and probability distributions. The 
forecasted limits are related with some percentage level, e.g. 95 %. This means 
that the prediction interval will contain the observation with probability 95 %. 
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The prediction interval can be analytically derived for some models (for normally 
distributed errors). If the forecast errors are normally distributed with zero mean 
(unbiased forecast) then the 95 % prediction interval can be estimated by 



F(i+l)±1.96\/M^ (10) 

where MSE means the mean square of the forecast errors. The similar formula 
can be derived for a k-step ahead forecasting. Sometimes these estimations pro- 
duce very wide intervals which would be of little practical use. 



7 Judgmental Forecasting 

The statistical forecasting methods try to find and extrapolate existing rela- 
tionships and they suppose that such relationships will not dramatically change 
in the future. (Slow changes can be modelled by adaptive statistical models). 
Changes should be detected as soon as possible to reduce forecasting errors. 
Human judgement is needed to correct the forecasting using the “inside infor- 
mation” and his knowledge. However, there are some limits and considerable 
biases using the judgmental forecasts. People do not want to he held responsible 
if their foreeast is wrong. The cost of the human forecast is significantly higher. 
There is strong inconsistency in human forecasting (people change their mind 
when there is no need to do, they are influenced by their mood, by the opinion 
of their colleagues or wife, might be bored). Therefore many companies combine 
the statistical forecasting with judgmental forecasts to improve the inputs for 
their decision making process. 



8 Forecasting Software 

There exists a lot of number of different software forecasting packages which are 
available on the market. Some companies (e.g. Siemens, Honeywell etc.), pro- 
duce complete “tailored” solutions and implement the proper forecasting models 
into the client information system. Of course, there are many other forecasting 
software and the selection of a suitable one can be difficult. Some small spe- 
cialized packages contains forecasting instruments which are not implemented in 
the large general statistical software. Some packages are easy to use and can be 
operated by users, other software can be applied with experts only. The ability 
to process large data sets with many variables is another criterion for selection 
of the suitable package. The most popular statistics packages are SAS with ETS 
component, SPSS, S-Plus, MiniTab. The examples of the specialized software are 
Forecast Pro, Autobox, NeuralWorks/Predict, SIBYL/Runner. There are some 
add-on forecasting function implemented in spreadsheet packages such as Excel, 
Lotus 1-2-3 and Quattro Pro. 
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9 Forecasting Journals, Conferences and Internet 
Addresses 

Besides of the journals in statistics, economics, environment and in other fields, 
which often include articles on forecasting, there are three specialized journals 
devoted to the forecasting problems. The “International Journal on Forecast- 
ing^’ and the “Journal on Forecasting” are the leading journals in the field. 
They published theoretically and practically oriented papers covered all areas 
of forecasting. The Journal of Business Forecasting includes mainly papers with 
practical aspects of forecasting. 

There are two forecasting associations. The International Institute of Fore- 
casters (IIF) is an association that includes both academics and practitioners. 
It hosts the annual conference International symposium of Forecasting (ISF). 
The International Association of Business Forecasters (lABF) includes mostly 
people interesting in business forecasting. These two associations published the 
joint newsletter (The Forum) which includes information about the forecast- 
ing conferences, reviews of forecasting software, short articles and news about 
forecasting. 

The information presented above together with FAQ’s (frequently asked ques- 
tions), time series data, links to other resources and more can be find at the IIF 
home page 

http://forecasting.cwru.edu. 

Since 1998, the prediction conferences NOSTRADAMUS have been organized 
in the Czech Republic. Further information can be found at 

http:/ /ft3. zlin.vutbr.cz/NOSTRA/NOSTRA.htm. 



10 Two Examples 

10.1 Foreign Exchange Rate Forecasting 

The foreign exchange rate is very complex time series which is influenced by 
many factors. We tested to predict of high-frequency foreign exchange quotes 
of the USD/DEM currency collected by the Olsen & Associates in the period 
1992-1993. We predicted the median of the quotes calculated by the moving 
window technique with size of 50 quotes. We had no “inside” information about 
the time series, so we decided to use the statistical methods for finding the rela- 
tionships between the past data and the predicted quantities. The average of 
the prediction horizon (APH) was about 18 minutes, so we tried to predict the 
exchange rates for next 18 minutes in average. I asked three of mine colleagues 
(Dr. Ladislav Pecen, Dr. Petr Klan and Dr. Petr Berka) to develop independently 
their models using the data from six months (November, 1992 -April, 1993). 
Five statistical methods were developed my me and by my colleagues and we 
tested their performance in May, 1993. We applied the stochastic differential 
equation based predictor ST and smoothing spline predictor SP (developed by 
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L. Pecen), the convex combination based predictor CC (developed by P. Klan), 
the neural network model NN (developed by E. Pelikan) and rule-based predic- 
tor KEX (developed by P. Berka). The results of our experiment are summarized 
in Table 1. Comb means the strategy, when the prediction is realized only in the 
case, when all five predictors produce the same results (next trend up or next 
trend down), trend represents the trend agreement score, ts means the trading 
score representing the averaged theoretical profit per one transaction (without 
transaction costs), Num means how many times the predictor forecasts during 
the given period. All results are compared with the “ideal” solution. We can see, 
that all methods produced better results than the “tossing up a coin” strategy 
(trend scores are above the 50 %). The combining strategy improved the predic- 
tion performance because the prediction models were developed independently 
using different principles and different sets of past values. 



Table 1. The forecasting performance of five different methods and their com- 
binations using the USD/DEM foreign exchange rate time series (18 minutes 
ahead forecasts in average) 



du50 APH (min): 18 Period: May 93 



method 


ST 


SP 


CC 


NN 


KEX 


Comb 


ideal 


trend (%) 


52.3 


56.4 


59.1 


58.8 


52.5 


61.5 


100 


ts 


0.08 


0.578 


1.09 


1.187 


0.561 


1.777 


5.347 


num. 


2285 


2285 


2285 


2285 


2285 


413 


2285 



10.2 Electric Load Forecasting 

In the opposite to the previous case, we have much more “inside” information 
about the electric load time series. We spent a lot of time together with experts 
from power distribution companies who adviced us what possible dependencies 
could exist between the electric load variables and the outside temperature, lumi- 
nosity, type of day, wind speed etc. Therefore we could build the explanatory 
forecasting models for several days ahead prediction horizon. We used, for exam- 
ple, a sigmoidal function for modelling of the relationship between the electric 
load values and the daily averaged temperatures. It was inspired by the fact 
that the electric load increases with decreasing temperatures up to some limit. 
A saturation effect appears at very low as well as at very high temperatures. 
Above (resp. below) these limits the electric load is practically independent on 
the outside temperature (we did not consider an “air-condition effect” in the 
Czech Republic). We also recognized the higher load sensitivity to the illumina- 
tion during the spring and the autumn period. This sensitivity has the nonlinear 
character. On Fig. 3 we show the relationship between the daily electric load, 
the cloud cover variables and the season represented by a number of the week. 
This relationship is modelled by the neural network with two inputs. Because the 
complexity of our neural network is very low, the model is easily understand- 
able and can be used as a component in our explanatory forecasting model. 
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The model was implemented in the power distribution company. The forecast- 
ing errors varies around 2 % which is very close to the errors of the measurement 
devices. 




cloud cover 



Fig. 3. Relationship between the daily electric load, the cloud cover variables (0 - 
bright sky, 1 - overcast) and the season (week number during a year) modelled 
by a neural network with two inputs 
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11 Conclusion 

In this paper we summarize some basic steps of the forecasting methodology 
including identification and description of the problem, prediction quality spec- 
ification, choosing a forecasting model, fitting method specification, fitting and 
validation data set specification, model architecture optimizing, testing, model 
implementation and using. We also mention some legal and economical aspect of 
forecasting as well some forecasting software, journals, conferences and internet 
addresses. Using two examples from banking and energy sector, we demonstrate 
different approaches in developing of the forecasting models. 

We can conclude that regardless of the huge progress in forecasting ap- 
proaches, mathematical complexity of the model, the statistical sophistication 
of the method, large databases and the power of computers, the computerized 
forecasting cannot substitute a human intuition. But, as support decision tools, 
these methods play a very valuable role in the information and knowledge pro- 
cessing area. Our forecast is that this will be true also in the future. 
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Abstract. Functional logic programming integrates the best features of 
modern functional and logic languages. The multi-paradigm declarative 
language Curry is an extension of Haskell which is intended to become 
a standard in the area. In this paper, we present UPV-Curry, an efficient 
and quite complete implementation of Curry based on a new, incremental 
definition of its basic evaluation mechanism. We compare UPV-Curry with 
already existing implementations of other Curry interpreters. 



1 Introduction 

Eunctional logic languages combine the best features of the most important 
declarative programming paradigms, namely functional and logic programming 
(see [9] for a survey). Throughout this decade, many practical proposals have 
been made to amalgamate functional and logic programming languages. How- 
ever, these languages have not succeeded in becoming widely used by the 
functional or logic programming communities. The multi-paradigm language 
Curry [10,8] is an extension of Haskell [12] which is supported by an interna- 
tional initiative to make it a standard in the area. In order to facilitate and 
extend the use of Curry, it is essential to make efficient and practical implemen- 
tations available. Program transformation techniques whose goal is to derive 
better semantically equivalent programs have recently been adapted to their use 
in Curry [1,4,5]. In this paper, we consider a different approach for improving 
the execution of Curry programs, which relies on an incremental definition of the 
Curry operational machinery. We present UPV-Curry, a novel implementation of 
Curry which provides an almost complete implementation of the language. In 
Curry, each evaluation step for an expression e is performed on a needed redex of 
e, i.e., a subexpression of e which is an instance of a left-hand side I of a (pos- 
sibly conditional) program equation l\c = r and which is really necessary for 
a complete evaluation of e. As is usual in functional logic languages, it is allowed 
to adequately instantiate variable occurrences of e in order to reduce this redex. 
The overall process is called needed narrowing [2]. To implement UPV-Curry, we 
have developed an ineremental optimization of the basic operational machin- 
ery of Curry which makes the pattern matching between program equations and 

* This work has been partially supported by CICYT TIC 98-0445-C03-01 and Accion 
Integrada hispano-alemana HA1997-0073. 

J. Pavelka, G. Tel, M. Bartosek (Eds.): SOFSEM’99, LNCS 1725, pp. 331-339, 1999. 
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goal expressions faster and is also able to locally resume the search for a new 
redex after performing each single evaluation step. We provide an experimental 
comparison of our implementation and other existing Curry interpreters. 

2 Preliminaries 

We assume familiarity with basic notions of term rewriting [6] and functional 
logic programming [9]. We consider a signature E partitioned into a set C of 
constructors and a set E of (defined) functions or operations. The set of terms and 
constructor terms with variables (e.g., x,y,z) from X are denoted by T{E,X) 
and T(C^ X), respectively. We write ^ for the list of objects oi, . . . , The set 
of variables occurring in a term t is denoted by Var(t). A term is linear if it does 
not contain multiple occurrences of one variable. root(t) denotes the symbol 
at the root of the term t. If root{t) G E {root{t) G C), the term t is said to 
be operation-rooted (constructor-rooted). A pattern is a linear term of the form 
f(dn) where / G ^ is n-ary and di, . . . , ^ T(C,X). 

A position p in a term t is a sequence of natural numbers (A denotes the empty 
sequence, i.e., the root position). Positions are ordered by the prefix ordering: 
p < A 3p' .(p.p' = q). t\p denotes the subterm of t at position p, and t[s]p 
denotes the result of replacing t\p by the term s (see [6] for details). Vos(t) 
(EVos(t)) is the set of positions (of operation-rooted subterms) of t. 

We denote by {xi ti, . . . , tn} the substitution a with a(xi) = U for 

i = 1, . . . , n (with Xi ^ Xj if i 7 ^ j), and a(x) = x for all other variables x. The 
identity substitution is denoted by id. Substitutions are extended to morphisms 
on terms by cr{f{tn)) = /(cr(^n)) for every term /(t^). Given terms t, s, we write 
t < s if 3 ( 7.5 = cr(t); a unifier of s and t is a substitution a with a(s) = cr(t). 

3 Curry 

Curry is a functional logic programming language that combines the best ideas of 
declarative languages such as Haskell [12] and SML [15] (functional languages), 
Godel [11] and AProlog [16] (logic languages), and Babel [13] and [14] (func- 
tional logic languages). More specifically, Curry includes higher-order features, 
a type system, a module system, modern evaluation strategies, non-determinism, 
(encapsulated) search, partial data structures, existential variables, constraints, 
and declarative I/O. 

We give a schematic description of the syntax of Curry. More details can be 
found in [8]. We first show a simple example of a Curry program. 

Example 1. The following Curry program: 

data Nat = Z | S Nat 
plus : : Nat -> Nat 

plus Z X = X plus (S x) y = S (plus x y) 

contains the declaration of the data type Nat, an (optional) type declaration of 
the function plus, and the set of equations describing the function plus. 
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User defined data types are collections of values which are terms built from the 
data constructors which are associated to the data type being considered. For 
instance, Z and S are the data constructors for the data type Nat, and Z, S Z, 
and S (S Z) are examples of values of Nat. A Curry program mainly consists of 
a collection of data type declarations and function definitions given by (condi- 
tional) equations f di • • • dn {\c\ = r where / is the function symbol which the 
equation (partially) defines, di, . . . , are constructor terms with variables, c is 
an (optional) condition, and r is the result expression. The condition c can either 
be a boolean expression (in the usual sense of functional languages) or a con- 
straint expression (i.e., a conjunction of equational constraints ei=:=e2 where 
both 6i and 62 are expressions). An expression has the form: 

e ::= X % variable 

\ c Cl • • • 6k % application of n-ary constructor c {0 < k < n) 

I / ei • • • e/c % application of n-ary function f {0 < k < n) 

I if b then ei else 62 % conditional expression 
I \ pattern -> e % a lambda abstraction 

I op 62 % an infix operator application 



3.1 Operational Semantics of Curry 

The operational principle of Curry uses definitional trees. Given a program 7^, 
a definitional tree T with pattern tt (notation, patternfiT) = tt) is an expression 
of the form^ [5]: 

T = rule{7T = r') where tt = r' is a variant of a program equation I = r G IZ. 
T = braneh{7T, o, 7^, . . . , 7^) where 7t|o G T, ci, . . . , are different construc- 
tors for n > 0, and each 7^ is a definitional tree with pattern 7 r[c^(xi, . . . , Xk)]o 
where k is the arity of Ci and xi, . . . , are new variables. 

A defined symbol / is called induetively sequential if there exists a definitional 
tree T with pattern /(xi, . . . ^Xk) whose rule nodes contain all (and only) the 
program equations defining /. In this case, we say that T is a definitional tree 
of /. A program IZ is induetively sequential if all its defined function symbols 
are inductively sequential. An inductively sequential program can be viewed as 
a set of definitional trees, each defining a function symbol. 

Example 2. Given the following function definition: 

first Z y = [] 

first (S n) (x:xs) = x : (first n xs) 

The associated definitional tree is: 

branch(±±Tst x y, 1, 

rule{±±rst Z y = []), 

branch(±±Tst (S n) y, 2, r'iz/e(f irst (S n) (m:ms) = m: (first n ms)))) 

^ Due to lack of space, we ignore the or nodes as well as the residuation of function 
calls used in Curry [8]. Of course, they have been considered in our implementation 
of the UPV-Curry interpreter. 
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first y 



first Z y = [] first (S n) 

first (S n) (m:ms) = m: (first n ms) 



Fig. 1. Definitional tree for the function first 



Figure 1 shows the graphical representation of this definitional tree. 

A function A from terms and definitional trees to sets of triples (p, i?, cr) 
determines each evaluation step. Here, p is a position in t, R is the rule to be 
applied, and a is a substitution which should be applied to t before reducing it. 
Given a term t and a definitional tree T such that pattern{T) <t [5]: 



' (A, / ^ r, id) if T = rule{l = r); 

(p, / ^ r, cr o r) if T = branch{7r, o, 7^, . . . , 7^), t\o = x e A', 

r = {x ^ pattern{7i) = 7t[q(^)]o, 

and (p, I ^r,(j) e %); 



X{t,T) 3 < 






r, cr) 



if T = branch{7T,o,Ti ,. . . ,T^,), t\o = Q(tn), 
pattern(Ti) = 7r[ci(^)]o, 
and (p, I ^ r, a) G A(t, %); 



{o.p, I ^ r, a) if T = branch{7r, o, 7 ^, . . . , 7 ^), 

t\o = f(U) for f 

T' is a definitional tree for /, 
and (p, I ^ r,a) e A(t|o, T') 



If (p,l^r,a) G then t 5 is a valid narrowing step, i.e., 

I < cr{t\p) and s = cr(t[r]p), which is called a needed narrowing step. 



4 UPV-Curry 

UPV-Curry is a novel interpreter of Curry which is publicly available from 
the URL: http://www.dsic.upv.es/users/elp/soft.html. UPV-Curry provides an 
almost complete implementation of Curry according to [8] (it lacks Curry mod- 
ules and encapsulated search). The working environment of UPV-Curry provides 
the following facilities [7]: 

— a small, self-contained Curry implementation, which constitutes a portable 
stand-alone SICStus Prolog application. 

— a read-eval-point loop for displaying the solution for each expression which 
is entered as an input to the interpreter. 

— simple browsing commands, which are able to obtain the type and the defi- 
nitional tree of each function (see below). 

— a debugging command, which permits tracing the evaluation of expressions. 
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4.1 Incremental Defininitional Trees 



The implementation of UPV-Curry is based on a simpler representation of defini- 
tional trees. Given a program IZ, an incremental definitional tree X with pattern 
7T is an expression of the form: 



X = irule{7T = r') where tt = r' is a variant of a program equation I = r E 7Z. 
X = ibranch{ofici,Xi), . . . ficn^Xn)) where 7 t|o G T, ci,...,Cn are construc- 
tors for n > 0, and each Xi is an incremental definitional tree with pattern 
7 t[q(xi, . . . , Xk)]o where k is the arity of q, and xi, . . . , x/e are new variables. 

The pattern of an incremental definitional tree can be obtained as follows: 

{ 7T i^X = irule{'K = r); 

pattern(Xi)[x]o if T = ibranch{o, (ci,Ti), . . . , (cn,T^)), 
and X 0 V ar {pattern (Xi)). 

Standard and incremental definitional trees are related by a function p: 



piX) = 



' irule{l = r) 

ibranch (o, (ci , p(7i )),... , 
{ck,p(Tk))) 



if T = rule{l = r); 

if T = branch{7T, o, 7i, . . . , 7^), 
and pattern{Xi) = 7t[q(TT)]o5 
1 < i < k 



Example 3. The definitional tree of Example 2 is represented as follows: 

ibranch{lfiZfirule{flTst Z y = [])), 

(S, z5ranc/i (2, (:, zm/e (first (S n) (m:ms) = m: (first n ms))))) 

Figure 2 shows the corresponding graphical representation. 



! 1 ! 




first Z y = [] 

first (S n) (m:ms) = m: (first n ms) 



Fig. 2. Incremental definitional tree for the function first 



In [3], we connect incremental definitional trees and different data structures 
which are used to guide needed reductions in functional programming. 
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4.2 Incremental Evaluation 

Let ei ^{pi,Ri,(Ti) ^i+i • • • be a needed narrowing sequence, 

where {pi, Ri, ai) G \{ei,Ti) and 7^ is a definitional tree for root{ei), i = 1, . . . , n. 
In the case when e^+ilp- is operation-rooted, we have that > pi^ hence we 
take (pi+i,i?i+i,cTi+i) = {pi.p,R,cr), where {p,R,cr) G A(e^+i|p.,T) and T is 
a definitional tree for root(e^+i |pj. If e^+ilp. is constructor-rooted or a variable, 
then it is necessary to resume the evaluation someplace above hence we go 
back to (where q < pi the position of the defined symbol immediately 

above Pi) by taking the (‘most defined’) definitional subtree T( of the definitional 
tree of root{ei\q) which has been used in the computation of immedi- 

ately before obtaining %. Thus, we let (p^+i, (Ji+i) = {q.p^R^a), where 
{p^R^cf) G A(ei+i |g, 7^'). This completes an incremental definition of A. 

We use a list C which contains the information needed to perform the needed 
narrowing steps incrementally; £ is a list of pairs (p, 1) where p is a position and 
X is an incremental definitional tree. The incremental needed narrowing strategy 
is denoted by AL Given an operation-rooted term t and a list C = [{q^ X), . . . ] with 
pattern{X) < t|g, we compute {p^R^a^C') G \^{t^C)^ where {p^R^a) describes 
a needed narrowing step and £' is a list containing the information which allows 
us to proceed with the subsequent evaluation steps incrementally. Formally, 



' (g, l^r, id, C') X — irule{l = r) and 

£/ _ f [(^ 5 ^ 0 |£] if ^(^) is operation-rooted 
\ C otherwise 

where t\q = 0(1) and X' is an incremental 
definitional tree for root(0(r)) 



X^(t,[(q,X)\C])3{ 



(p,l^r,a o T,C') if X = ihranch(o, (ci,Xi), . . . , (cn,Xn)), 

t\q.o Ci(T^)}, 

and (p,l^r,a,C) G A^(r(t), [(g,Xi)|£]); 
(p, l^r, cr, £') if X = ibranch(o, (ci, Xi ),..., (cn,Xn)), 

t\q.o — Ci(tk)f 

and (p,l^r,a,C') G A^'(t, [{q,Xi)\C]); 



(p, l^r, a, C!) \iX — ibranch(o, (ci, Xi ),..., (cn,Xn)), 
t\q.o = /(U) for / G X, 

X' is an incremental definitional tree for /, 
and (p,l^r,a, C) G \\t, [(q.o,X'),(q,X)\C\) 



The needed narrowing steps performed by A^ and the computation steps carried 
out by the original needed narrowing strategy A are equivalent. 



Theorem 1. Let IZ be an induetively sequential program, ei ^ e 2 ^ ^ 

On be a needed narrowing derivation where Xi is a definitional tree sueh that 
pattern(Ti) < e^ for all 1 < i < n, and Co = [(A, p(7i))]. Then, for all 1 < i < n, 
(pi,Ri,cFi) G \(ei,Xi) if and only if (pi, Ri,ai, Ci) G X^(ei,Ci-i). 



UPV-Curry: An Incremental Curry Interpreter 337 



Table 1. Runtime goals and comparison with other Curry interpreters (in ms.) 



Benchmark 


Goal 


UPV-Curry 


TasteCurry 


PACS-TasteCurry 


iter 


iter 100 subl 100 


323 


1174 


909 


ackermann 


ackermann 20 


640 


2505 


1203 


mergesort 


sort (intMerge) [3,1»2] xs 


313 


870 


442 


quicksort 


qsort [10,9,8,7,6,5,4,3,2,1,0] 


2476 


1829 


1511 


f ibonacci 


f ibonacci 10 


576 


2262 


1106 


horseman 


horseman x y 8 20 


2655 


3542 


1658 


last 


last [llioo 


990 


58736 


4256 



5 Experimental Results 

In [3], we provide experimental evidence of the fact that all the modifications 
described in the previous sections are actually (and independently) effective by 
comparing a preliminary, non-optimized version of UPV-Curry and different ver- 
sions of the interpreter including incremental definitional trees and incremental 
evaluation. In this section, we compare UPV-Curry and other Curry interpreters. 
Table 1 provides the runtimes of the benchmarks for UPV-Curry, TasteCurry, and 
PACS-TasteCurry interpreters^. Times were measured on a SUN SparcStation, 
running under UNIX System V Release 4.0. They are expressed in milliseconds 
and are the average of 10 executions. Most of the benchmarks used for the anal- 
ysis (namely, quicksort, last, horseman, and mergesort) are standard Curry 
test programs^. The benchmarks ackermann, f ibonacci, and iter are given in 
Table 2. The functions ackermann, f ibonacci, last, quicksort and mergesort, 
are well-known; horseman computes the number of men and horses that have 
a certain number of heads and feet; finally, iter produces a sequence of n nested 
calls to a given function. Natural numbers are implemented using Z/S-terms, and 
lists are shown in goals using a subindex which represents its size. The figures 
in Table 1 show that our UPV-Curry implementation performs very well in com- 
parison to the other Curry interpreters. In overall, it takes about 80 % the time 
needed by PACS-TasteCurry and about 38 % the time needed by TasteCurry to 
evaluate the queries. These results seem to substantiate the advantages of using 
the proposed incremental techniques. A more detailed comparison among Curry 
implementations - including also more benchmark examples -is underway. 

6 Conclusions 

We have presented UPV-Curry, a novel and effective implementation of Curry 
which provides an almost complete implementation of the language. In order to 

^ TasteCurry and PACS-TasteCurry are available from 
http : //www-i2 . inf ormatik . rwth-aachen . de/~hanus/ curry/ . 

^ Look at http://www-i2.informatik.rwth-aachen.de/~hanus/curry/examples/ 
for a collection of examples containing these and other available Curry test programs. 
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improve performance, we have implemented (and proved correct) an incremental 
(re) definition of the basic evaluation strategy of Curry which is able to locally 
resume the search for a new redex after performing a revious incremental nar- 
rowing step, and also reduces the number of term positions to be considered in 
a needed narrowing derivation. Finally, we have provided the (first) experimental 
comparison of different implementations of existing Curry interpreters. 
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